Why writing a linked list in safe Rust is so damned hard

ekidd · on Feb 23, 2018

Yeah, a doubly-linked list is basically the worst possible choice for your first program in Rust. This is because Rust likes all of your memory to have one clear owner. And this means that cyclic data structures are unusually difficult compared to almost any other first program you might choose.

But if you really want to know how to do it, there's a really good tutorial on the subject titled "Learning Rust With Entirely Too Many Linked Lists": http://cglab.ca/~abeinges/blah/too-many-lists/book/

This will walk you through multiple different kinds of linked lists, and show you how to implement each in Rust. Along the way, you'll learn more about Rust's ownership system than most working Rust programmers generally need to know.

Personally, I've written quite a bit of production Rust code, and I've done so without ever touching 'unsafe' or needing a linked list. If I really did need a linked list, I'd either grab a good one off crates.io, or just knock out a quick one using `unsafe` and raw pointers. I mean, Rust can do basically anything C does if I ask nicely. It's just that if I choose to write "C in Rust", then I get about same safety guarantees that C offers.

rusbus · on Feb 23, 2018

OP here. In hindsight, this is 100% correct. Hopefully this post will help people avoid my mistakes.

amelius · on Feb 23, 2018

Just out of curiosity, didn't you do a google-search for "linked list in Rust" when you ran into problems? Or isn't that your style? :)

Anyway, thanks for the thought-provoking post.

rusbus · on Feb 23, 2018

I did. I found the book of linked lists at the bottom of the post. After reading and appreciating the complexity I didn't want to deal with, I just wrote it in Go

littlestymaar · on Feb 24, 2018

No jugement here, but I curious to understand your situation: if you didn't want to deal with the complexity of the implementation, why didn't you just import a linked list from the standard library ? Did you just want to implement a linked list for learning, but still put the minimum effort in the process ?

rusbus · on Feb 26, 2018

The standard library linked list doesn't support inserting in the middle

IshKebab · on Feb 23, 2018

Yeah also even if you think you need a linked list, you probably don't. Vectors are faster in nearly every case, even ones that linked lists are designed to solve.

spockz · on Feb 23, 2018

I’m mightily curious to know how you decide which double linked list to get from crates, isn’t there a clear best implementation? Are the differences/trade offs described somewhere? And if you can get one from crates, why would you implement your own with unsafe at all? Wouldn’t using an existing C linked list be more convenient?

ekidd · on Feb 23, 2018

So the first question: Why are you using a doubly-linked list? :-)

Due to the way modern processors work, following long chains of pointers is massively expensive (unless can you carefully control where all of the memory was allocated). There's a whole section discussing this in the "Too Many Lists" book. Basically, following pointers to unpredictable places will eventually defeat your processor's caches. You could easily wind up running a thousand times slower than if you used the cache well.

So you first want to look at data structures which store many elements in a single block, so that they're adjacent in memory and you won't have to chase nearly as many pointers.

- If you just want an ordered group of elements, use a "Vec", which is similar to a C array.

- If you're going to add elements at the back, and remove them from the front, look for a queue. These can be implemented with ring buffers or slabs of memory containing multiple elements at a time. There's a couple in the standard library.

- If you need to keep elements in order and search for them, you might actually want a B-tree, which again uses slabs of memory. There's a nice BTreeMap in the standard library.

Basically, by the time you've analyzed your problem, there's a 95% chance that a doubly-linked list was just the wrong data structure. Cache locality just has too big an effect on performance.

But if you really need a linked list, go to https://crates.io/ and search for "linked list" or "doubly linked list." Flip through the first page or two, and look for something with lots of downloads and nice docs. Check if it supports the APIs you want for your use case.

Also take a look at petgraph, which is an awesome Rust graph library: https://crates.io/crates/petgraph This has a zillion downloads (by Rust data structure standards), plenty of reference docs, and good support. It knows about cache locality. And a linked list is really just a special case of a graph.

But basically, you shouldn't be using doubly-linked lists unless you know exactly why you need one. They're a nice teaching exercise (except in Rust or functional languages) but they're actually pretty specialized.

imtringued · on Feb 23, 2018

>So the first question: Why are you using a doubly-linked list? :-)

Honestly there is only one legitimate usecase for a linked list that can only be solved suboptimally with a continguous array. Removing elements in the middle of a list assuming you already know the pointer to the linked list node.

In a linked list you can just overwrite the previous node to link to the next node.

However what I do is I just swap the element with the last one in a vector and delete the last element. This doesn't preserve the order of the list but for me this has never been a significant issue.

Someone · on Feb 23, 2018

”there is only one legitimate usecase for a linked list that can only be solved suboptimally with a continguous array. Removing elements in the middle of a list assuming you already know the pointer to the linked list node.”

You will be surprised to see how large a memmove you can do on modern hardware in the time it would chase those pointers.

For an extreme example (moving array data gets slower if your elements grow larger), see https://youthdev.net/en/performance-of-array-vs-linked-list-...

vinkelhake · on Feb 24, 2018

The usecase imtringued was talking about:

> assuming you already know the pointer to the linked list node

For example a LinkedHashMap where a hash provides the primary means of looking up elements and the nodes contain an intrusive linked list. Removing an element from that list doesn't involve chasing pointers from the start of it.

The linked benchmark is for a different scenario.

saghm · on Feb 23, 2018

Not sure why you'd need to use something from crates.io; there's a doubly-linked list in the standard library[1]. That being said, the standard library documentation itself strongly suggests that you use a VecDeque in most cases[2].

[1]: https://doc.rust-lang.org/std/collections/struct.LinkedList.... [2]: https://doc.rust-lang.org/std/collections/index.html#use-a-l...

madez · on Feb 23, 2018

That Rusts compiler fights against intuitive doubly linked lists is reasonable, because they wouldn't work when working in parallel on the list, while Rust by default also ensures safety doing that.

Having that in mind makes it easier for me to come up with ways how to appease the compiler. What measures would you take to write a thread-safe doubly linked list in C? Would you make it entirely exclusive and blocking? Well, then you might want let one "list" structure own all nodes. Would you want to allow parallel execution of operations on the list? Well, then break the ownership of the nodes up into smaller parts that you can hand out in parallel when they don't affect each other. What granularity of allowed and prepared-for parallelism gives the best performance depends on the use-case.

kenhwang · on Feb 23, 2018

I think this is the point that's often overlooked. Everything in safe rust is supposed to be both memory safe and thread safe. The textbook implementation of linked lists and trees are not thread safe, and easily not memory safe.

bluejekyll · on Feb 23, 2018

I’ll be the pedantic one. It doesn’t guarantee thread safety, it guarantees that its data race free.

In practice this often reduces to the same thing, but you can still create deadlocks, etc.

madez · on Feb 23, 2018

Yes, you are correct. Rusts safety does not rule out deadlocks. Guaranteeing data-race-freeness is however already enough to reject intuitive doubly-linked lists. Thanks for bringing that up, which I don't think is pedantic.

zenhack · on Feb 23, 2018

Lots of (most?) rust libraries are "not thread safe" in the sense that those data structures aren't though. The reason this is ok is that to actually share a piece of data across threads it needs to implement Sync (which requires unsafe).

Also, as a sibling comment points out, the "thread safety" guarantee is relatively narrow -- it won't guarantee a general lack of concurrency bugs any more than a gc will guarentee reasonable memory usage.

I haven't thought through the details, but I suspect it's possible to adjust the semantics of rust in a way that allows multiple mutable references without sacrificing memory safety.

What you would hit is that the compiler would be hindered when doing optimizations in the same way that c and c++ compilers are, because of pointer aliasing.

There's a trade off there, and I think you can make the argument either way.

steveklabnik · on Feb 23, 2018

Sync is implemented automatically for most types; if you want to tell the compiler something is Sync when it thinks it isn’t, that’s when you need unsafe.

Same with Send, which is more primitive than Sync.

zenhack · on Feb 23, 2018

Ah, point. That would probably need to change to relax the one mutable reference constraint, if you wanted to.

littlestymaar · on Feb 24, 2018

> I haven't thought through the details, but I suspect it's possible to adjust the semantics of rust in a way that allows multiple mutable references without sacrificing memory safety.

Please do think about the details deeply, because if you find a way to do this, you'll probably bring a revolution to Rust.

I'm kind of skeptical of course, but I'd be really happy to be wrong.

cryptonector · on Feb 23, 2018

I'm not sure how to build a lock-less doubly-linked list in C, honestly, and it may not be possible. I've authored several lock-less data structures in C, but I've not thought about this one... A singly-linked list is easy-peasy for addition at the head or tail (just atomic reads and atomic compare-and-swap), though deletion is tricky (because the obvious CAS thing to do is racy if you'd free the deleted element, and you do want to free it eventually...).

A good pattern for thread-safety is to use immutable data structures. jq's jv API [0] is a great example of how to make it easy to deal with immutable data structures. Of course, you don't get to have cycles this way and so you don't get to have doubly-linked lists. Also, for root state you kinda need something like a thread-safe variable (of which there are a number of flavors in Clojure, Haskell, and others) where when you read it you're guaranteed to have a stable reference to the value, but also guaranteed that garbage will be collected -- this is great when the values are immutable data structures.

You could have cycles in immutable data structures if you install the cycle before ever sharing with other threads and can make sure you don't fall into a loop when releasing the immutable value. But this is the sort of thing that requires "unsafe mode" unless the compiler/runtime can figure out that a) you haven't shared the thing yet, b) it won't be mutated once shared. I don't know how to figure that out statically, but that might be a good avenue for research.

[0] https://github.com/stedolan/jq/wiki/C-API:-jv

Someone · on Feb 24, 2018

Some CPUs have a 128-bit compare-and-swap (lock cmpxchg16b on x64). You can build a lockless doubly-linked list with that. See for example https://gist.github.com/glampert/c40f2584d2fbc72316e1c8a6ef1...

As to the doubly-linked list in Rust: I think one could add the notions of “allocated block that must have n references to it” and accompanying “one of n references to an allocated block”. That could lead, for example, to code

   let p1, p2 = heap::allocate2(elem_size, align);

to allocate such a block with two references. The borrow tracker could then ensure that both p1 and p2 get stored once somewhere before the function allocating the memory returns or get returned from the function, or that both get passed to a function freeing the memory, and never used afterwards.

tyingq · on Feb 23, 2018

I respect your expertise here, but pulling up 20,000 feet...shouldn't Rust feel some urgency to provide something like simple-to-use common data structures without pain? Or a reasonable alternative? Current state feels a little science projecty.

nicoburns · on Feb 23, 2018

There are simple-to-use data structures available in Rust. In fact, they are particularly easy access due to the excellent package manager. What is not so easy is to write some of these data structures yourself. Which is a publicity problem for Rust, as these tens to be exercises that people from a C/C++ background want to try in Rust.

awalton · on Feb 23, 2018

> In fact, they are particularly easy access due to the excellent package manager.

People keep screaming this like it's a huge advantage, but having built applications for years, knowing that one of the most difficult parts of modern applications is maintenance, especially of open source dependencies - making sure they're up to date, making sure you are on the same page with the rest of your company, making sure that they're available when you need to build for reproduciblity, making sure you can trust the library, making sure it has a sane interface and maintainer that listens and understands your use cases, etc... I've honestly come to see package managers more as a liability than an advantage. They're tremendous for this Wild Wild West github-pull-requests-are-life style that's become popularized by Javascript programmers, living their life to build one application quickly and move right on to the next... but that's not something I'd aspire to.

Every one of these new languages goes "Woo package manager", everyone codes against them, and then dependencies start stacking up, they start going out of date, APIs change, people move on to other projects, etc, and before you know it you've got 150 copies of "leftpad" and someone deletes the leftpad repository and breaks every build in your company... And this isn't a new story - it's happened to every language I've dealt with that has one of these package managers, from Perl's CPAN to Go.

I want a language to have "Batteries Included", not "Batteries Available by Easy Download from the Internet From Strangers' Githubs". It's the one good thing C++ had going for it - the STL contained the data structures you needed to get going quickly and then it got out of your way - you could bring your own as soon as you needed. You didn't need to track the STL as a dependency and wonder if someone changed the return value of a function - it basically never happened, which meant you didn't feel the need to abstract it or insulate yourself from its API in case it broke.

I want to know my dependencies, and to know that I can trust them, and that my application can be maintained indefinitely, that I won't get stuck on some old library because I assumed this random crate everyone recommended was the best way to do something, then everyone changed their minds and I end up having the dreadful choice of rewriting all my code or taking on maintainership of someone else's abandoned heap.

But I also get that Rust is young and maybe as it matures they'll move some of this stuff downstack so you won't have to just say "Oh I'll just download XYZ package from the internet, what could go wrong..." Or maybe I'm just wrong and this chaos and questionable maintainability is desirable in some insane way, and I just belong to a different generation of developers who want to build lasting applications and not weekend projects...

kbenson · on Feb 23, 2018

I think perhaps this is one of those times you should have looked into the package manager and package management system prior to going full rant. I'm with you on the state of package management in general, and most if not all the problems you outlined were very specifically addressed by rust's package manager and system.

There are solutions for these problems, they aren't perfect, but it's getting better.

lolikoisuru · on Feb 23, 2018

Not Invented Here

This disease is extremely common among programming language developers and enthusiasts. A rust developer or a rust enthusiast just cannot fathom using a generic tool and instead everything has to be rewritten in rust. This applies to pretty much everything that a programmer might use but is especially bad with package managers.

steveklabnik · on Feb 23, 2018

C and C++ do not have a cross-platform, uniform package manager and build system used by most projects. It’s more of a “not invented” than a “not invented here” situation.

madez · on Feb 23, 2018

I share both the views of the grandparent and that we should consider rewriting things in Rust. They are not mutually exclusive.

We need ways to package and distribute software written in a variety of languages. Rust should play nice and cooperate with these. That does not exclude that doing things in Rust may give advantages.

tyingq · on Feb 23, 2018

Ahh, okay. So maybe a "c++ common approach" vs "rust best practices" FAQ? Like "maybe you don't need a c++ like doubly linked list, and here's why" write-up?

quodlibetor · on Feb 23, 2018

Linked lists are in the standard library[1]. The source code is linked to from that page (`src`, on the top-right corner), although the implementation is optimized so it's probably not a great learning resource.

If, for some reason, you actually need a linked list there are tons of implementations available on crates.io[2].

It's not that there aren't linked lists, and it's not that they can't be implemented as efficiently as in C, it's that implementing them is an inherently tricky problem where the obvious implementation can't be proved by Rust's type system so they're an extremely unpleasant way to learn Rust.

1: https://doc.rust-lang.org/std/collections/struct.LinkedList.... 2: https://crates.io/search?q=linked%20list

tyingq · on Feb 23, 2018

I'm not sure that really addresses my question. You seem to be saying C++ folks need to bear the burden of figuring it out. That's fine, but it explains the conflict. If the Rust's team position is that C++ diehards need to fully understand Rust first, then the conflict seems expected. On the other hand, if the Rust team is interested in serious evangelism, something seems missing.

romwell · on Feb 23, 2018

>"maybe you don't need a c++ like doubly linked list, and here's why" write-up?

Done and done, you're welcome:

http://cglab.ca/~abeinges/blah/too-many-lists/book/

mathw · on Feb 23, 2018

I think it's more that there's one in the standard library, so you don't actually need to write one in Rust, but if you look at how they did it you'll find that it was written in unsafe Rust because that's how you do things that rely on pointers for their semantics.

dbaupp · on Feb 23, 2018

A write-up for the latter: https://isocpp.org/blog/2014/06/stroustrup-lists

High level summary: linked lists are usually not the best data structure.

jopsen · on Feb 23, 2018

To be fair, sometimes you have to write a trie or DFA or some other linked data structure.

However, if one is willing to deal with the possibility that the _next_ pointer can be null, then isn't reference counting and with weak references perfectly fine?

saghm · on Feb 23, 2018

I mentioned this in another comment, but it does provide a doubly-linked list in the standard library: https://doc.rust-lang.org/std/collections/struct.LinkedList....

littlestymaar · on Feb 24, 2018

> shouldn't Rust feel some urgency to provide something like simple-to-use common data structures without pain

You can already find most useful data structures in the standard library, including a doubly linked list. (Which probably doesn't even count as a useful one).

imtringued · on Feb 23, 2018

I'm not sure that doubly linked lists are common. They are pretty obscure highly specialised datastructures that are only useful when you really know you need them.

Animats · on Feb 23, 2018

I've previously discussed on HN how Rust could support backpointers safely.[1] The idea is to have a "backpointer" attribute, with some additional checking. Backpointers are non-owning pointers locked in an invariant relationship with an owning pointer.

You need backpointers not only for doubly linked lists, but for some tree-type data structures. The backpointer invariant is easy to check at run-time, and it's often possible to eliminate that check at compile time. With this feature, few if any tree-type data structures need "unsafe". Tree update code is error-prone; you need all the help you can get there.

The other basic thing you can't express safely in Rust is a partially initialized array. If you had syntax for "This array is initialized up to element N", with suitable checking, "vec" could be written safely.

Rust is pretty close on this. Those are the two main gaps in the safety system. Calling external non-Rust code is a separate problem, one which hopefully will decline as more low-level libraries are rewritten in Rust.

[1] https://news.ycombinator.com/item?id=14303858

bbatha · on Feb 23, 2018

I don’t think your solution works. It will make sure the pointer is never null (assuming these back pointers are !Sync) sure, but rust guarantees far more that that. Each ownership type (value, &, &mut) is a capability, it very important to rust guarantees that you can’t go up the chain. Each node have an owned pointer to the next means that it can mutate it at anytime, the back pointer would always alias the previous node meaning that it has to be a weakened “&” reference (allows aliasing, lifetime rules relaxed). Because all of the nodes are owned it’s possible to traverse the back pointer then move forward again and recover ownership or a &mut of the same node twice.

You can however allocate the nodes in an arena and store your forward and back pointers in Cells. You can’t move the arena, but you’ll get nice back pointers. You could also make the linked list be backed by a slab and store indexes instead of node pointers. Both of these solutions will likely be more cache and allocation efficient than a traditional link list which does allocation for every node.

Animats · on Feb 27, 2018

Right, you can go backwards, but not with mutability. Weak refs have some of the same problems.[1] You don't get the ability to mutate doubly linked lists this way. Navigate trees, yes; mutate them from below, no.

[1] https://www.reddit.com/r/rust/comments/3csud3/how_do_rust_we...

jpfed · on Feb 23, 2018

These backpointers sound kind of like the ownership analog to weak references.

Animats · on Feb 23, 2018

Right, they are basically weak references, but more efficient ones. Weak references involve not just a counter, but something that gets deallocated when the counter goes to zero. That adds complexity to a trivial operation.

vvanders · on Feb 23, 2018

Option #3(indexed tree) looks non-ideal but it actually has some really cool properties if you do it right.

1. If you know traversal order you can get really great cache coherency. We used to do this all the time with animation DAGs and the like in gamedev.

2. If everything is an "offset"/index instead of a pointer you can do things like in-place loading where creating something is just a single read() call. No constructors or annoying allocations to slow down loading. If you want to take this even further you can mmap() the entire file on disk and use massive(500mb+) files without using much actual memory overhead through letting the kernel page it in/out for you.

Narishma · on Feb 23, 2018

> 1. If you know traversal order you can get really great cache coherency.

Cache locality, not coherency.

vvanders · on Feb 23, 2018

Sorry yes, had a brain fart there.

_hrfd · on Feb 23, 2018

Why does option #3 look non-ideal, what are your concerns with it?

vvanders · on Feb 23, 2018

Well, non-ideal may be poor phrasing. I guess I meant to say it doesn't look like the "classic" linked list implementation.

squiguy7 · on Feb 23, 2018

There is some good discussion on /r/rust [0] as well. I'm not trying to invalidate the author's post but rather share some more insight.

[0]: https://www.reddit.com/r/rust/comments/7z7p5m/why_writing_a_...

madez · on Feb 23, 2018

> I'm not trying to invalidate the author's post but rather share some more insight.

You don't need to excuse yourself ahead of time. If you wanted to invalidate anything the content of your comment should speak for itself. Every argument is expected to give some insight by default.

always_good · on Feb 23, 2018

Agreed. I'm sure some people think you're being petty but we've wandered way too far into this ultra defensive language on the internet.

I think it's a habit formed from the toxic behavior of people responding to you to try to pin you on some stupid detail, like pointing out that there are exceptions to some uncontroversial statement you made that was only 1% of the point of your post.

zerosanity · on Feb 23, 2018

Just a friendly reminder the you can get the Programming Rust book for only $15 along with a bunch of other good books for 3 more days at the Humble Bundle Functional Programming Bundle. I've been working my way through it and learned a lot so far about Rust.

kovrik · on Feb 23, 2018

I've struggled with Rust just a couple of times, nothing serious, so I'm not an experienced Rustacean by any means.

Question:

Ownership system obviously imposes some limitations, but gives safety in return.

Are there any data-structures or algorithms or something that you simply cannot implement in Rust without using unsafe?

pcwalton · on Feb 23, 2018

In theory, no, because safe Rust is enough to implement the C VM. You could always (again, theoretically) implement whatever data structure you want on top of a giant Vec<u8> heap of memory, asm.js style. Of course, this isn't something you would want to do in practice!

justinpombrio · on Feb 23, 2018

> Are there any data-structures or algorithms or something that you simply cannot implement in Rust without using unsafe?

No. If you wrap every piece of data in your whole program in RefCell, the borrow checker will leave you alone, and it will be like programming in most other languages. (There are some minor differences, like the fact that your program will be refcounted rather than garbage collected, which doesn't deal with cyclic references, but let's ignore those.) Alternatively, you can wrap your whole program in "unsafe{...}", and use raw pointers everywhere, and it will be similar to programming in C.

EDIT: My comment is trying to give a general understanding that will hold most of the time. See the other comments for fun edge cases :-).

kovrik · on Feb 23, 2018

But if you use RefCell, then you won't get any useful compile-time checks, will you?

In other words: if you have Rust _without_ unsafe and without RefCell (and similar stuff), will you still be able to implement anything in it and keep compile-time checks and other benefits of ownership system?

justinpombrio · on Feb 23, 2018

> But if you use RefCell, then you won't get any useful compile-time checks, will you?

If you use RefCell, then the compile-time checks won't be necessary. For example, in Java there are no compile-time checks: everything is just garbage-collected at runtime. Likewise, RefCell is lightweight garbage-collection (modulo cyclic references).

> In other words: if you have Rust _without_ unsafe and without RefCell (and similar stuff), will you still be able to implement anything in it and keep compile-time checks and other benefits of ownership system?

Ah, in that case there are a lot of things you can't implement: Strings, doubly-linked lists (which is the point of this article), trees with backpointers, graphs, vectors, etc. Fortunately, you rarely need to: if you need a data structure, it's probably already implemented in Rust. The standard library has most common data structures, and there are often crates for less common ones. If you do need to write unsafe Rust, it's about as scary as C. I've written a reasonable amount of Rust code, and only ran into one situation where I (think) I need unsafe code.

steveklabnik · on Feb 23, 2018

You’re confusing Rc and RefCell, RefCell is “borrow checking at runtime”, Rc is “lightweight garbage collection”.

justinpombrio · on Feb 23, 2018

Aaagh, yes! I meant Rc everywhere :-(.

Const-me · on Feb 23, 2018

Theoretically no, because Turing complete.

Practically yes, there’re many well-known algorithms processing linked lists, trees and graphs. These algorithms are used everywhere in practice, processing syntax/expression/DOM/filesystem trees, MRU/LRU lists, objects/dependencies/network/pathfinding graphs. Safe rust implementation of these structures wastes too much resources.

ridiculous_fish · on Feb 23, 2018

XOR linked list? https://en.wikipedia.org/wiki/XOR_linked_list

justinpombrio · on Feb 23, 2018

To be fair, you can't implement that in most languages.

_gok2 · on Feb 23, 2018

Something that people don't seem to know how to do (despite several attempts by known rust developers) are zero-copy streaming iterators.

Tracking the lifetimes of references in this way gets really hard really quickly, and rust isn't currently able to work it out

gnarbarian · on Feb 23, 2018

Technically no, as long as safe rust is Turing Complete and the data structures you're talking about aren't defined by unsafe behavior.

Practically, I have no idea.

cwzwarich · on Feb 23, 2018

You can't implement many concurrent data structures that rely on things like acquire/release consistency or single-copy atomicity in safe Rust.

dataflow · on Feb 23, 2018

> I find a bit of solace in the fact that implementing a data structure like this in a non-garbage collected language without Rust is also quite tricky

What? No it isn't. The entire problem here is incorrectly assuming that "A owns B" implies "A has a pointer to B". I don't know if this is a Rust-imposed constraint, but it certainly isn't a logically necessary one. Just do what C++ (and C#, etc.) do: the data structure (std::list, etc.) owns all the nodes, and the nodes merely reference each other. The nodes don't own their siblings. It makes sense and it doesn't get tricky.

dbaupp · on Feb 23, 2018

The difficulty in Rust is reflecting how there's not much difference between a correct doubly-linked list and an incorrect one: it's not too hard to end up with dangling pointers due to bad destruction.

It's (very) hard for a compiler to tell that a back-pointer won't be held around after the thing to which it points is destroyed, or even the pointer to a node after something else deallocates it, e.g. in pseudo-code:

  def remove_from_list(node: pointer ListNode):
      node.prev.next = node.next
      node.next.prev = node.prev

Unfortunately, this would be a use-after-free, if one were to translate that code into C++, using modern features that are always touted as part of "modern C++ is safe enough", with the obvious definition:

  class ListNode {
    std::unique_ptr<ListNode> next;
    ListNode *prev;
    // some data or whatever
  };

One could even switch to 'std::optional<std::reference_wrapper<ListNode>> prev' (i.e. trying to avoid C legacy that people sometimes suggest is the unsafety in C++), and... it doesn't change anything.

Each node is owned by its previous one, meaning 'node.prev.next = node.next' overwrites the owner of 'node', and so the next line is accessing freed memory.

Of course, it's not very hard to fix (or find, in the first place) this particular example, e.g. just swapping the lines, or a completely different ownership scheme ala what you say about std::list. But, the above code looks very reasonable, is a problem even using good modern idioms in C++, and is also only a minor change from correct code (and, C and C++ do not offer much assistance to find or fix these sort of problems).

dataflow · on Feb 23, 2018

Fun fact: The `unique_ptr` approach you mention is also broken (and I would go so far as to say it's wrong entirely). The destructors will blow the stack. I really do think std::list's approach is the only correct solution for a generic linked list.

dbaupp · on Feb 23, 2018

Using the default destructor is broken, yes, but the approach isn't: one can avoid it in the list's destructor using a loop that walks over the list to not let nodes be destroyed recursively (i.e. clearing the next pointers).

    node = head
    while node:
        next = transferOwnershipAndClear(node.next)
        node = next

Where the third line would be 'std::unique_ptr<...> next(std::move(node.next)); ' in C++ that uses unique_ptr<...>, or 'let next = node.next.take()' in Rust that uses Option<Box<...>>.

Of course, you could definitely argue that there's little point using unique_ptr if you're still having to write a destructor for the list itself.

dataflow · on Feb 23, 2018

You can keep putting bandages over it, but the approach itself is fundamentally broken. It fundamentally intertwines the memory management with the data structure management when the two really are pretty orthogonal to each other. For example, if you ever want to use a different allocation function, you'll run into trouble. How do you get the right function to deallocate each node, and at what penalty? Does every node keep a pointer to a common allocator now? Or, for example, if you have a move constructor/assignment like that, now you're declaring that it makes sense to "move" one node onto another, even if they use different allocators. But does it really? Their ownership simply doesn't have anything to do with their sibling relationship, and you're forcing them to be tied together. Like, yeah, you can keep putting bandages on top of this, rub some alcohol on it, and giving it crutches to make it work, and I'm sure you'll eventually make it work, but the right thing to do is to just step back and realize the flaw is in the approach itself: some concerns are global across the nodes (like memory management), and some are local (data structure management), and hence it doesn't make sense to mix the two.

(Oh, and did I point out that all of this means you'll have to find a different solution when the data structure is no longer linear to allow a destructor hack like that? The fact that the scope of the approach is limited by a property that really isn't all that relevant to the actual problem is another sign that it isn't the gift one.)

dbaupp · on Feb 23, 2018

I am not advocating for it being a great way to implement a linked list, just using it as an example that can be easily understood in a throwaway comment.

As an alternative:

  # Delete all the elements from node 'from' to (but not including) node 'to'
  erase(list, from, to):
    from.prev.next = to
    to.prev = from.prev
    while from and from != to:
      from = from.next
      list.allocator.free(from)
    # update 'tail' if necessary

  destroy(list):
    node = list.head
    while node:
       next = node.next
       list.allocator.free(node)
       node = next

This code is also wrong. Calling erase(list, list.tail, list.head) or similar will create a loop, and then destroy will loop forever/read dangling pointers. This is also very close to correct (I believe 'destroy' looking for node == tail instead of node being null would avoid the loop), but isn't.

Yes, it's easy to write a linked list, but it's also easy to screw up in subtle ways, and that line is thin. The safe subset of Rust restricts what is legal to be able to get some sort of control/understanding on pointer-soups, so that it can verify and validate that problems like the above don't turn into really bad problems. (The equivalent thing couldn't happen with vector-and-indices or arena-allocated nodes, and would result in the less dangerous[1] problem of an infinite loop/memory leak with Rc/Weak or garbage collection.)

[1]: A denial of service is better than remote code execution (proof: the latter can be turned into the former).

dataflow · on Feb 23, 2018

So now your rebuttal to me pointing out of this design flaw is that it is possible to write a buggy implementation even in the absence of such design flaws, and that Rust can help you avoid those bugs too. I'm not sure entirely what else I'm supposed to say in response, but yes, that is a true statement.

dbaupp · on Feb 23, 2018

I'm arguing against your top-level original comment; I have no attachment to any particular scheme for implementing linked lists, they're just examples of things that are almost right, but are wrong enough to result in memory corruption.

My thesis is "it does get tricky to implement a linked list without garbage collection".

My original example was simple to avoid having to work through a longer example. Now that I reread, I agree that that particular example isn't relevant.

My second example was demonstrating that even a list fitting into your proposed ownership scheme (which, you say, "doesn't get tricky") does get tricky. You apparently agree that this is problematic, so you also apparently agree that it is tricky to implement a linked list.

Rust rejects most attempts because of the risk of memory unsafety, and it's hard to convince it otherwise, because there's little difference between 'safe' and 'unsafe' in a pointer soup.

dataflow · on Feb 23, 2018

Oh, so all you're arguing against is my statement that "it doesn't get tricky"? In that case I think you're somehow missing the entire point of these discussions. I didn't make that statement in the abstract; it had some context behind it. The thesis of the article (and hence the basis for my comment) was that linked lists are tricky in Rust -- and presumably not (as much) elsewhere, or Rust wouldn't the focus of the article. I am saying, no, that trickiness only comes about because he's insisting on the wrong design, and if he took the same approach as in C++ (or C#) he would no longer encounter it. That's what "it doesn't get tricky" means. If you pull it out of context and make your thesis that "it does get tricky to implement a linked list without garbage collection" in general, then okay, yeah, sure... linked lists are tricky, pointers are tricky, programming is tricky, life is tricky, etc... but then we get nowhere since all of those are entirely missing the Rust context behind the discussion.

dbaupp · on Feb 23, 2018

No, I am disagreeing with your explicit disagreement with "implementing a data structure like this in a non-garbage collected language without Rust is also quite tricky".

Now that I reread for a third time, I suspect you may've picked up on the "a data structure like this" to mean "a linked list with this exact ownership arrangement", whereas I interpreted it as the looser "a data structure like a linked list". That puts the rest of your comments into more context, and sure, I grant you that it's a suboptimal choice, but I don't think it's what the author meant.

In any case, I think I was rather assuming too much Rust context: taking your suggestion does not resolve the trickiness in Rust (as in, you won't be able to convince the compiler to accept it without using `unsafe` or reference counting). It's a very common meme about Rust that there's no way to write a safe C++-stye linked list with pointers, something independent of the choice of ownership scheme. "Fixing" the design doesn't actually help: it's still way too much of a pointer soup.

dataflow · on Feb 23, 2018

> No, I am disagreeing with your explicit disagreement with "implementing a data structure like this in a non-garbage collected language without Rust is also quite tricky". [...]

I see what you're saying, and again: it's missing the entire baseline and context for that claim. My thesis was that if he had known/considered/used the actual std::list design (which I assume he hadn't, or he wouldn't have proposed nodes that owned siblings), he would not have considered linked lists to be tricky in non-GC'd languages according to whatever his baseline is for that (presumably, tricky enough to blog about it). But somehow you simply extracted my reply with that one quote I replied to, discarded all the rest of the context (his blog post and my comment and all), lowered the baseline for "tricky" (from ≈"tricky enough for him to call it 'tricky' and blog about it" to ≈"possible for the average programmer to write an initially-buggy implementation thereof"), and then ran away with this beautiful straw man to refute. =P Except that (obviously, I had thought) wasn't my claim in the first place...

dataflow · on Feb 23, 2018

Oops, typo... s/gift/right/

pjmlp · on Feb 23, 2018

I think the C++ example is safer modeled like this

    class ListNode {
        std::unique_ptr<ListNode> next;
        std::weak_ptr<ListNode> prev;
        // some data or whatever
      };

dbaupp · on Feb 23, 2018

That doesn't work. std::weak_ptr is a reference counted pointer matched to std::shared_ptr; there's no concept of weak pointers (i.e. non-owning with dynamic checks) for std::unique_ptr, which corresponds almost exactly to Rust's Box.

bfrog · on Feb 23, 2018

Semantically thats what the Rust std library linked list does as well https://doc.rust-lang.org/std/collections/struct.LinkedList....

However that LinkedList implementation requires unsafe{} to be implemented. All unsafe really means is that the compiler isn't going to hold your hand, the usual memory ownership footgun is available at your discretion.

unsafe shouldn't be this mythical thing you don't touch like people seem to think it is. If you need to escape the compilers very helpful guidance you can and should, but test thoroughly!

dataflow · on Feb 23, 2018

Going on a tangent, but I honestly think 'unsafe' might suffer from a naming issue. It should've been called 'unchecked' or 'unverifiable' or something that says the code is merely not verified to be safe, not that it is actually unsafe.

xenadu02 · on Feb 23, 2018

Nope, unsafe does exactly what it says on the tin.

C# tackled this problem 15 years ago. I'm sure other languages (Haskell) did it even earlier. When to use unsafe is a judgement call. Each developer and team will have to set their own standards. Some people will abuse it. None of this is new. At first it scares people. They think this is the brave new world, using unsafe feels gross and backwards! Eventually they understand where it is and isn't appropriate.

You might think "so what? Why even bother with a safe-by-default language?"

Because it greatly restricts the problem space. Rather than being forced to examine every line of code for every possible bit of undefined behavior or every path of flow control for memory errors you only need to think really hard about edge cases inside the unsafe blocks. Simply by virtue of being a relatively small number of blocks of few lines the problem of safety and correctness becomes easier to understand. Easier to test. Easier to reason about.

Unsafe is a tool. It's a dangerous tool so you should always wear your gloves and safety goggles. But when faced with a problem for which it is the best tool you should use it without regret.

derefr · on Feb 23, 2018

> Nope, unsafe does exactly what it says on the tin.

Depends on how you interpret the name—whether it's referring to what it does (makes things no longer automatically safe), or whether it's referring to what the code inside it does.

If you write only safe code, inside an unsafe{} block, then nothing unsafe is happening. Fewer compile-time static-analysis checks are happening, but if you manually verify that the code is "safe anyway" the way C/C++ programmers are forced to do, then you can declare that the code is safe, maybe with a comment like:

    // this is safe
    unsafe{ ... }

That seems bizarre and contradictory, no? But it would seem less weird if it was:

    // this is safe
    unchecked{ ... }

Of course, there's no reason you should be using an unsafe{} block for only safe code, so unsafe{} is usually a pretty good label for the content of the block.

gilbetron · on Feb 23, 2018

All of C and C++ is basically unsafe, and we got a lot done with it! It's ok to use unsafe from time to time if you really need to :)

_diyu · on Feb 23, 2018

Right but the hope was that Rust was all the way safe, not just most of the way safe. That’s it main niche. That’s why someone would choose it over C++. But if 95% of your code is safe and 5% of your code isn’t, and the safe 95% uses that unsafe 5% all over the place, it inherently makes the safe code an entry point into unsafe code, kinda-sorta making it unsafe too. So it ends up feeling like all the hard work to keep things safe was a waste.

reificator · on Feb 23, 2018

But when you're debugging, you know where to focus your efforts.

If 95% of your code touches the other 5%, then that 5% is probably pretty important and useful and hopefully fast. Spending some extra time to verify safety in exchange for speed/control is a small price to pay, and will pay dividends from the other 95% of code that doesn't have to be inspected so closely.

nickm12 · on Feb 23, 2018

No, the objective for Rust was never to be all the way safe.

Rust gets much inspired by C++ and seeks to be a systems language where " there should be no room for a lower-level language between [it] and native machine code".

If you want that, you need unsafe blocks. The intent is to use those blocks to build safe abstractions that can be used for the lion's share of your program.

dataflow · on Feb 23, 2018

> "there should be no room for a lower-level language between [it] and native machine code"

I hadn't heard this before, and I love it! This seems like a very nice way to define what a systems programming language is. :-)

dataflow · on Feb 23, 2018

I'm not having a problem with it personally, I'm just saying that trying to educate the developers instead of addressing it on Rust's end seems like a potentially losing battle, as unfortunate as it might be.

pjmlp · on Feb 23, 2018

Actually this problem was tackled in NEWP during the 60's.

Section 8, UNSAFE mode

http://public.support.unisys.com/aseries/docs/clearpath-mcp-...

madez · on Feb 23, 2018

I hope we don't settle for unsafe being okay forever. Right now, sometimes it is the right thing to do and there shouldn't be any regret. But in the future, I hope Rusts compilers become better.

There are two things I consider necessary for that. First, that the Rust compilers become smarter in proving the safety of things by themselves. Second, that the Rust compilers become capable of verifying proofs given to them that show the safety of a given piece of code the compilers can't prove as safe on their own.

pcwalton · on Feb 23, 2018

I think that it is a worthwhile goal to be able to someday formally prove all the unsafe blocks correct in, say, the standard library and popular crates.

However, I honestly feel that the Rust language itself isn't really the right language to be doing these kinds of proofs in. I think that the right language to do the actual formal verification in is likely to be something closer to Coq. Whoever undertakes this effort would probably use an automated theorem prover to prove the unsafe Rust code correct, like was done for seL4 using Isabelle.

You can think of this setup as offering a sort of layered verification: once the small core of code (the unsafe code in the standard library and popular crates, say) is proven correct, the type system and borrow checker effectively prove the rest to be memory- and thread-safe. In fact, that's what would make this system practical: most programmers wouldn't have to understand anything about the complexities of the theorem prover. They would get the benefit of verified memory- and thread-safety for free just by learning Rust.

madez · on Feb 23, 2018

I also thought foremost about standard libraries and other often used code, or say, critical code in some OS.

We don't need to verify every single occurrence of unsafe, but whenever unsafe is necessary I feel the lack of some other guarantee holding our back.

Having optional small-scale(!) verification in Rust would be awesome.

Using other theorem provers has the same problems as when one tries to establish tools for checking memory safety in C: it isn't the default, just an addon.

In my eyes, there is a scale of verification-readiness in which Rust can position itself. The least ready would be having it completely separate and done by other tools in other files, the most ready would be having syntax for it in the language, having it in the same files, and checked by the standard compiler.

I think every bit of verification-readiness Rust has by default will have a strong effect that we can't achieve through other means.

Maybe, some things don't impact those not interested in verification. Say, have next to 'safe' and 'unsafe' also the 'verified' environment that also contains proof language as a core part of Rust. That way all ordinary code is still valid and everyody is free to use Rust without verification in mind.

Any ideas what could be done to make Rust verification-readier?

nickm12 · on Feb 23, 2018

Rust needs unsafe to be Rust. It's reasonable to expect (safe) Rust to become more expressive over time so that more things can be written in safe Rust (or written more conveniently) but expecting the escape hatch to go away entirely is misguided. It is as unlikely as C++ or C trying to do away with inline assembly.

pjmlp · on Feb 23, 2018

Microsoft no longer supports inline assembly on the 64bit compilers, only intrisics.

I also imagine that it isn't allowed when using Bitcode deployment on iDevices.

andrewflnr · on Feb 23, 2018

IMO that's less of a Rust problem than a problem for the next generation of verified languages. Probably total functional languages, or maybe just usable versions of Coq.

Retra · on Feb 23, 2018

The Rust compiler will not solve the halting problem. It is pretty trivial to write programs which are safe if and only if they halt. So 100% safe is simply absurd.

madez · on Feb 23, 2018

That is trivially obvious, but writing a performant vector implementation is not solving the halting problem.

Let me restrict it to the faintly weaker "I hope we don't settle for unsafe being okay forever where we can prove safety".

andrewflnr · on Feb 23, 2018

You don't need to solve the halting problem just to verify an existing proof of a semantic property, nor to use smarter heuristics to avoid requiring such a proof. 100% safe is totally reachable, though I bet the syntax would be pretty hairy added to today's Rust.

jononor · on Feb 23, 2018

Would need to get rid of C FFI, as that cannot be 'safe' in Rust?

andrewflnr · on Feb 24, 2018

Hmm, good point. You'd have to extend the formal verification into the C code, at least. If you can do that, it might be easier to just write verified C.

Retra · on Feb 24, 2018

Inline ASM can't be verified either.

andrewflnr · on Feb 24, 2018

Inline ASM is exactly as verifiable as the underlying CPU, given an adequate model in the verifier. That's probably easier than verifying C, which introduces extra ambiguity in its semantics. But yeah, verifiable CPUs would be nice.

kbenson · on Feb 23, 2018

That may be due to a quirk of how we use it in English (and maybe other languages, but I can't speak to that). For some reason, unsafe or not safe isn't always perceived as a logical negation as some other concepts usually are, and is instead parsed as opposite of safe. "Verified" seems to suffer from this less, but is also less strong in what it positively implies.

Is it possible that the link to personal safety of the terminology is what provides the best positive connotation in terminology but also causes this unwanted ambiguity in negated interpretation? If so, that's an annoying catch-22...

j1f4 · on Feb 23, 2018

The word that I have seen in similar contexts is 'trusted', which I like and would have preferred -- the block has extra privileges and isn't machine verified. Some people tend to give 'trusted' an opposite reading when they first come across it, though.

lmkg · on Feb 23, 2018

The problem with that word is it doesn't say who is doing the trusting, which is the crucial point. In fact, "trusted" can be used to describe both safe code and unsafe code. In the safe code, the programmer is trusting the compiler. In unsafe code, the compiler is trusting the programmer. Both code environments are "trusted," but the trust is being given to different parties.

dataflow · on Feb 23, 2018

Oh yeah, 'trusted' would definitely give the opposite of the intended meaning! I had to read your comment twice just to get why you were calling it that.

contravariant · on Feb 23, 2018

To me it makes perfect sense that the keyword 'unsafe' disables safety mechanisms.

Narishma · on Feb 23, 2018

But it doesn't disable anything. In fact, it does the opposite: it enables additional mechanisms which can't be checked by the compiler.

contravariant · on Feb 23, 2018

That's a bit like saying that unlocking your front door adds an additional door to your house.

__s · on Feb 23, 2018

Agreed. Pointers are unsafe. References are safe, & almost powerful enough to make one think they don't need pointers. But if you find yourself needing pointers over references, they're there

kovrik · on Feb 23, 2018

> However that LinkedList implementation requires unsafe{} to be implemented.

Can it be implemented without using unsafe?

Do they have in plans to get rid of all unsafe from std?

kinghajj · on Feb 23, 2018

No on both counts, the entire point of "unsafe" is to form safe abstractions around unprovably safe code.

viraptor · on Feb 23, 2018

It can't. And there are many things std does that are in unsafe blocks. There's nothing wrong with that if you have a specific use case for it. (And wrap that functionality in a safe function)

stochastic_monk · on Feb 23, 2018

I find that evangelical Rustaceans often speak ill of C++ without knowing how one would solve a problem in C++, especially C++11/14/17.

I’ve been told that “it takes a decade to grow a good C++ developer”. I think that’s an overestimate, having only written it for a few years, but it does take a lot of skill to write good C++, while Rust’s borrow-checker enforces safe practices.

vvanders · on Feb 23, 2018

> The entire problem here is incorrectly assuming that "A owns B" implies "A has a pointer to B". I don't know if this is a Rust-imposed constraint in some fashion, but it certainly isn't a necessary one.

Rust has support for weak references[1] to support this exact use case.

For 90% of uses cases A owns B tends to produce much cleaner architecture. Having ambiguous ownership is just a recipe for resource leaks and non-deterministic behavior.

[1] https://doc.rust-lang.org/std/rc/struct.Weak.html

dataflow · on Feb 23, 2018

Hmm... that doesn't make sense. Weak references are for refcounted values. There does not need to be any refcounting involved here at all.

Retra · on Feb 23, 2018

There's no safe, general way to do shared ownership without ref counting or garbage collection. You can't expect Rust to provide one; it already provides ref counting and raw pointers, which is as good as any other Non-GC language does.

vvanders · on Feb 23, 2018

Yes but single owner and refcount of 1 are semantically the same. If you want to be strict about this just keep the Rc private to the implementations and only hand out Weak<T>.

dataflow · on Feb 23, 2018

So you're suggesting we add a refcount to every value because it can be made to be semantically the same as not having one?

vvanders · on Feb 23, 2018

If you want to stay in safe Rust, sure. If that's too much overhead feel free to drop down to unsafe and do it just like in C/C++.

FWIW Rc is pretty lightweight. Arc is the heavier(and threadsafe) variant.

foota · on Feb 23, 2018

I think you mean "The entire problem here is incorrectly assuming that "A has a pointer to B" implies "A owns B"."

Although perhaps I'm wrong.

dataflow · on Feb 23, 2018

Not quite what I meant. What I meant was that the hidden assumption is that ownership must be expressed through a direct pointer, leading him to ignore the possibility of something like std::list, which doesn't have a direct pointer to all the nodes.

foota · on Feb 23, 2018

Aw, I see now. Thanks!

zanny · on Feb 23, 2018

Nobody mentioned that the Rust std has a doubly linked list in it? [1]

It uses shared pointers to reference other nodes in the sequence.

[1] https://doc.rust-lang.org/src/alloc/linked_list.rs.html#46-5...

rusbus · on Feb 23, 2018

Tragically, this doesn't solve one of the only reasons to use a linked-list: constant insertion in the middle.

teacpde · on Feb 23, 2018

I recently started to actually write some Rust code after reading about Rust here and there. The experience has been quite unique, it constantly forces me to think about the code at low level, which I find refreshing. And the compiler is truly impressive, it pinpoints me where things go wrong, and conveys the error messages in a very human-like fashion.

hcs · on Feb 23, 2018

> // I actually don't understand why the line below compiles.

> // Since `head` was moved into the box, I'm not sure why I can mutate it.

> head.next = Some(Box::new(next));

I'm fairly new to Rust myself, but it's my understanding that since it was moved into the Box, the variable "head" is now just considered effectively uninitialized, so you can go ahead and set its fields, or overwrite it entirely with head = Node {...}, without affecting the value that was moved into the Box.

Rusky · on Feb 23, 2018

That is precisely correct. It also works before the variable has even been initialized to begin with: https://play.rust-lang.org/?gist=faf1642e4e1f48decbeac704505...

AceJohnny2 · on Feb 23, 2018

Tangentially, I'd love to see some list of "what does this language make easy" (C: raw memory manipulation!) and "makes hard" (C: memory-safe code)...

Does one exist for Rust?

leetcrew · on Feb 23, 2018

tbh I don't know enough about rust or the whole field of software development to give an authoritative answer, but if you don't know much about rust, this might help you.

in general, rust is great for pretty much anything c is great for. you can do raw memory manipulation in unsafe blocks if you want, but you write memory safe code by default. at present, rust is definitely slower than c, but there's no inherent reason that it has to be so; mainly it's just the consequence of being a new and immature language.

one neat thing you can do with rust is build safe interfaces to the c libraries that you know and love for a relatively small performance penalty. I am writing a toy graphics application in rust, and it is so much nicer than bare OpenGL, although there are some serious pain points with library maturity.

it can also be a decent substitute for problems you might solve in C# / Java / other statically-typed languages, although it is a bit more strict and explicit than those.

the main things I can think of that rust makes hard are applications where you really don't want static typing (web dev, scripting, etc.) or you have a need to use a lot of specific libraries that you don't feel like writing interfaces for.

zenhack · on Feb 23, 2018

I will contest the idea that you don't want static typing for web dev. Curious, what statically typed languages do you have experience with? I ask because you only mention C# and Java, and I often find people who form their ideas about types from those languages think they have to be much more cumbersome than they really do.

leetcrew · on Feb 23, 2018

> what statically typed languages do you have experience with?

c, c++, java, rust, c#, so the assumption you're making is probably correct.

zenhack · on Feb 23, 2018

If you're at all interested in taking the red pill: elm and reason/ocaml are some things worth checking out. There is so much more out there than the C family tree.

Const-me · on Feb 23, 2018

> rust is great for pretty much anything c is great for

No it’s not.

> you can do raw memory manipulation in unsafe blocks if you want

You can but Rust is not great at it. At least not in comparison with C or C++.

In C, people have been writing these unsafe memory manipulations for half a century, accumulating knowledge, working on runtime & libraries, and building tools like verifiers, debuggers, profilers. C++ has even more of these, adding some safety features to language and standard library while still being C compatible.

leetcrew · on Feb 23, 2018

most of this seems to be an issue of maturity for the language itself and the surrounding ecosystem, which I did touch on a little bit. I probably should have made it a more explicit caveat.

aside from ubiquity and maturity, I'm curious what features c has that rust lacks for raw memory manipulation. I have yet to work my way through all of the rustonomicon and I am far from an expert c programmer, so I would appreciate the opportunity to fill in some knowledge gaps.

Const-me · on Feb 23, 2018

I agree that for C it’s mostly the ecosystem i.e. external tools & libraries. There’re just a few features in the language and runtime, such as debug heap and the preprocessor.

But C++ has a lot to offer besides tooling. Modern-style C++ (template containers, smart pointers) solve the majority of safety issues solved by safe Rust. But C++ doesn’t hide these raw pointers behind safe abstractions, and that higher level safer stuff is optional.

For specific features helping with raw memory manipulation see e.g. routines from <algorithm> header, http://en.cppreference.com/w/cpp/algorithm They are part of C++ standard library, and yet they support dangerous C arrays, because raw C pointers double as C++ iterators.

steveklabnik · on Feb 23, 2018

I’d characterize the situation differently; Rust handles a lot of things that Modern C++, and not even the Core Guidelines, try to handle. Iteration invalidation is huge. Concurrency and parallelism issues are huge.

I very much welcome these things, as I’m about safe software, not “only Rust”, but I don’t think the “most” claim holds water.

Const-me · on Feb 23, 2018

It’s all about tradeoffs.

If you prioritize performance over safety, there’s no way around these problems. To implement efficient algorithms processing trees, lists or graphs, you have to fallback to unsafe code & potentially invalid raw pointers. Unsafe Rust is as dangerous as C++, but for C++, standard library, runtime and tools provide huge help implementing and debugging such things.

If you prioritize safety over performance, of course Rust is way ahead of C++. But Java and C# are much easier to use and deliver safety comparable to Rust. Also in these languages, trees & graphs might be even faster than in safe Rust.

steveklabnik · on Feb 23, 2018

We believe this is a false dichotomy. If Rust is significantly slower than C++, it’s a bug. Generally, most of the time, we have succeeded at this, sometimes being faster, sometimes slower.

Most of those tools work on Rust as well. And we are pretty sure we can have better tools in the future, but that’s a while off.

Const-me · on Feb 24, 2018

> If Rust is significantly slower than C++, it’s a bug.

Safe Rust is significantly slower than C++ when working with pointer based stuff like trees and graphs. That’s why people use unsafe Rust for that kind of code.

> Most of those tools work on Rust as well.

It’s not just tools, also language and libraries.

Rust was designed for safety, and apparently unsafe was just neglected. Or maybe it was a decision to neglect making people use safe Rust instead (BTW same decision was made by Java’s designers at Sun).

In any case, the current state of unsafe Rust is not OK. AFAIK, stable unsafe Rust doesn’t even has malloc & free functions.

steveklabnik · on Feb 24, 2018

Yes. Unsafe code must exist for Rust to accomplish it's goals. Rust is also a practical language, and unsafe is an important part of that. The key is, unsafe is a relatively small percentage of code overall; even operating system projects have a very small amount of unsafe.

> apparently unsafe was just neglected.

I wouldn't agree with this. We work on unsafe things all the time; for example, NonNull<T> was just stabilized. As with any open source project, stuff gets done as people have the time and desire to do it.

> AFAIK, stable unsafe Rust doesn’t even has malloc & free functions.

So, this is literally true, but it's not because we hate unsafe or something. It's because a good allocator API is hard. We've been putting a lot of work into it over the last year or so, and it's actually pretty close to being in stable. The team moved to stabilize it back in October of last year, but some last-minute stuff has popped up.

Const-me · on Feb 25, 2018

> unsafe is a relatively small percentage of code overall

I don’t think it’s a good idea to talk about percentage of code overall when discussing Rust. Take a look at http://githut.info/ you’ll see that majority of overall code is higher-level GC languages.

I’ve been professionally developing software since 2000, have a lot of experience with different languages and platforms, and I’m speaking from my experience. There’re 2 major reasons why now in 2018 I still pick C++ for some software or some components of it.

(1) Code that relies heavily on native interop. Like OS APIs for lower-level device IO, advanced networking stuff, GPU interop, other OS APIs. All of these APIs are C, or sometimes on Windows it’s C++.

(2) Performance-critical CPU bound code. One part of that is SIMD, but pointer-based structures also help a lot, and they are not small percentage of my code. BTW, another thing missing in current Rust is custom allocators. In C++ I can implement a custom allocator in just a couple hundred lines of code, and plug it into any collection for non-trivial performance gain in some use cases: https://github.com/Const-me/CollectionMicrobench Another C++ feature helping me with these performance-critical calculations is OpenMP.

Of course, Rust evolves quite fast, and it may change some day. But in its current state, I don’t think Rust is an adequate C++ replacement for the kind of problems I solve in C++.

steveklabnik · on Feb 25, 2018

GitHut shows data from 2014, incidentally.

Rust easily accesses C/OS APIs, and has great tools like Rayon for paralellizing code. Both wrap unsafe and have nice safe interfaces too.

If you like C++, you should use it, though. Not everyone will use Rust. That’s 100% okay.

dbaupp · on Feb 24, 2018

> AFAIK, stable unsafe Rust doesn’t even has malloc & free functions.

It's stable to import the C malloc/free, and it's relatively easy to use Vec as an allocator: https://www.reddit.com/r/rust/comments/7yhhq6/borrow_cycles_...

pjmlp · on Feb 23, 2018

C++ iterators might look like pointers they aren't necessarily pointers implementation-wise.

Const-me · on Feb 23, 2018

That’s true. Nevertheless, you can pass pointers to C++ algorithms accepting iterators.

smitherfield · on Feb 23, 2018

> "what does this language make easy" (C: raw memory manipulation!)

Depending on what you mean by "raw memory manipulation" that can actually be surprisingly difficult with C, without running into undefined behavior, at least technically-speaking. e.g. type-punning through unions is defined by all implementations I can think of, but technically UB according to the standard.

That said, if you know the UB rules, C/C++ are still the "nicest" mainstream languages to use for that sort of super-low-level manipulation of raw object representations. The non-UB type-punning methods in C/C++ (`memcpy(3)`, casting addresses to pointer-to-`char`), finicky as they are, are still more ergonomic and, ironically, safer than Rust's poorly-designed `mem::transmute()` API.

Language-lawyering is quite the rabbit hole, BTW; for example, it was discovered a few years ago that the wording of the formal C++ memory model defined in the C++11 and later standards means all non-trivial C programs (technically-speaking, not according to any existing or sane implementation) invoke undefined behavior in C++. We'll be lucky if the bikeshedding over the fix for this gets resolved in time to make C++20.

  #include <stdlib.h>

  void *safe_ptr(void *p) {
  	if (!p) exit(1);
    	return p;
  }

  #define INIT(TYPE) safe_ptr(malloc(sizeof(TYPE)))

  int *get_intbuffer(void) {
  	enum {N=3};
  	int *const intbuffer = (int*)INIT(int[N]);
  	int i = N;
	/* Technically UB in C++; lifetime starts
	 * w/ declaration or initializing w/ new
	 * or placement-new, so reads/writes thru
  	 * the result of malloc w/o initializing
  	 * the objects w/ placement-new are UB */
  	while (i--) intbuffer[i] = i;
  	return intbuffer;
  }

dbaupp · on Feb 24, 2018

> are still more ergonomic and, ironically, safer than Rust's poorly-designed `mem::transmute()` API.

Eh, I disagree. I find:

    let x: f32 = ...;
    let y: i32 = mem::transmute(x);

is nicer than

    float x = ...;
    int y;
    memcpy(&y, &x, sizeof x);

And, in terms of safety, transmute manages the sizes of everything and makes sure they match. There's no risk of passing the wrong size, or having types with mismatched sizes (resulting in either "slicing", or buffer overflows).

The only way I can see transmute being more dangerous than the equivalent is that one can allow the types to be inferred, which can result in badness if the inferred type isn't what the programmer is expecting. This is especially dangerous when type involves a lifetime. (I suppose one could argue that it also makes this dangerous operation easier to do, since there's an easy-to-find function for it. But... this goes both ways, since it also stops people being tempted into using undefined pointer casting like ^(float ^)(x) (using ^ for * since HN likes italics too much).)

> Language-lawyering is quite the rabbit hole, BTW; for example, it was discovered a few years ago that the wording of the formal C++ memory model defined in the C++11 and later standards means all non-trivial C programs (technically-speaking, not according to any existing or sane implementation) invoke undefined behavior in C++. We'll be lucky if the bikeshedding over the fix for this gets resolved in time to make C++20.

I'm curious if you've got a defect report link or similar, because my reading of "6.8 Object Lifetime" in the C++17 draft implies that the lifetime has started for those int objects since the two conditions are both satisfied:

- "storage with the proper alignment and size for type T is obtained": the malloc'd pointer is fine.

- "if the object has non-vacuous initialization, its initialization is complete": ints have vacuous initialization.

But, I'm probably wrong, and would appreciate being corrected.

camgunz · on Feb 24, 2018

I think the Rust version is a lot less clear. I don't intuitively know what "transmute" is; if I guess it means "mutate across something... here from a float to an int" I don't know how it knows I mean int, so I have to suppose there's something special in Rust that lets me assign some bytes to a scalar like this; I worry that it does a copy instead of just mutating the bytes of `i32`; finally I'm unclear whether or not this uses unsafe (I bet that it does though). In contrast I know what memory is and I know what copying is.

This is maybe the core of what bothers me about Rust? There are good and valid arguments about C, but its core attraction is "everything is a number, memory is an array". I think Rust has achieved a great thing, but I wish it were like, 50x simpler. I would give up so many things for that.

dbaupp · on Feb 24, 2018

> I don't intuitively know what "transmute" is

That's fair, but... it's easily resolved by using Rust/reading the documentation. Humans don't intuitively know what, say, 'float' in C means, either, but they learn it quickly.

Additionally, the non-Rust meaning of "transmute" is pretty close to how it's used in Rust: "To change, transform or convert one thing to another, or from one state or form to another".

There's definitely a lot of good in making things obvious to beginners to the language, but there's also always a general collection of "jargon"/symbols one ends up having to learn. Someone's who's a beginner in Rust but familiar with C may spend more time working out what the transmute line means than the C, but someone familiar with Rust wouldn't (and certainly not if they're not familiar with C: they'd have to work out the src/dst order for memcpy).

> I think Rust has achieved a great thing, but I wish it were like, 50x simpler

As others have pointed out elsewhere many times: what would you remove? The core of what people complain about in Rust (lifetimes etc.), is also core to achieving it's goals. All of those core features are pretty orthogonal and fairly minimal.

camgunz · on Feb 28, 2018

So I looked up transmute, and it turns out this is how it's called:

    let x: f32 = 0;
    let y: i32 = std::mem::transmute::<f32, i32>(x)

I'm fine with this, actually. With the type information it's clear. I do think it's a little long-winded, but I don't see a way to shorten it without giving up namespacing or safety.

> As others have pointed out elsewhere many times: what would you remove? The core of what people complain about in Rust (lifetimes etc.), is also core to achieving it's goals. All of those core features are pretty orthogonal and fairly minimal.

I'm pretty OK with lifetimes. I think probably they could be a little more explicit and have better syntax, but I think they're complicated by nature and -- as you point out -- core to achieving memory/data race safety in Rust.

I'm happy, however, to create a big gripe list here for you, haha :) Mostly my criticisms are:

- some concepts are unclear because they use overloaded keywords or imprecise language

- some features are confusing due to inconsistent structure/syntax

- some (many) features aren't worth their complexity

Innnnnnn order of the Rust book:

Rust is often different for no real reason. Casting is a good example of this; what was wrong with `(i8)thing`? What does `as` gain us? Couldn't we have used `as` in a more powerful way, like context managers in Python for example?

I'm not wild about `macro!`. Because macros are hygenic, I don't care if something is a macro or not and the `!` makes those calls stick out unnecessarily. I would have preferred that `!` indicated mutation in some way, like in Scheme for example. Or maybe get rid of the `mut` keyword and use `!` in variables. Really anything would be better.

I don't find "everything is an expression" to be that valuable, and it leads to a lot of weirdness. Semicolons suppressing block expression values is one -- why would you ever want to use a block as an rvalue to only assign `unit` to the lvalue? Returning from loops is another, and that looks very weird: `break <expr>`. I can see how if/else as expressions is nice shorthand in a ternary kind of way, but I wouldn't (and don't, in any other systems language) mind just initializing my variables with zero values and updating them inside those blocks.

Generally I like destructuring, but I think Rust's struct destructuring is a little much. I wouldn't mind having... I don't know 4 more lines of variable initialization the 0.0008% of the time I need to do this.

I would prefer that ranges weren't special syntax and instead worked just like any other iterator. This also avoids weird parens in range-based pipelines.

I'm a little on the fence with `match`. I think it strikes the right balance between Python's "use if/else for everything" and C's restrictive/dangerous switch, but most match blocks I see are messy. I think all in all I'm into it, and I like that it's got a history in other languages like OCaml, I just wish I could come up w/ a way to clean it up a little.

match guards make me crazy though; that's exactly what if statements are for. The whole reason I like match/switch more than if/else is that it's restricted to the thing you're matching/switching on. A match guard can run a conditional on anything. It's completely superfluous if your language has if/else.

You can probably guess I don't like match binding either. First I think that's what `let` is for, but I also think using `@` is both very non-intuitive and a big waste of an operator -- all to avoid a single `let` expression.

if/while let I really like, but its weird that they don't follow the same rules as regular let expressions, like can `Some(i)` be an lvalue normally (no, it can't). I would prefer that they did follow the same rules, or that they used different syntax than `let` because they are in fact different.

I wish closures taking no arguments didn't look like the boolean or operator. Using `cfn()` (closure function) here wouldn't have been terrible, I don't think.

`pub` doesn't really mean "public", it means "visibility" and pub without modifiers means "visibility(public)". I do sort of like `mod`, but I dislike having to nest things inside of it. I would prefer a file-based approach, where you declared the file's module at the top.

I would prefer `impl <trait> on <struct>` instead of `impl <trait> for <struct>` because `for` already means something else.

I would have liked to use traits as types in function signatures instead of the clunky generic syntax. Alternatively just require the `where` clause. Both options reduce the number of things you have to know.

I wish the "new type idiom" didn't reuse `struct`. Probably `type` goes there.

I don't really like the ceremony around heap allocation. I know there are a lot of benefits to `Box`, but I'd prefer `*` to Box<T> and `alloc` or even just `new` to `Box::new`. Or hey, if you're against operator reuse, let's use `@` now that we've tossed match binding. Mainly my complaint is this is unnecessarily different from other systems languages.

---

Probably these seem like really small issues, but I really think that all these things together would make Rust much more consistent and clear at a very slight cost to ergonomics.

dbaupp · on March 1, 2018

> Rust is often different for no real reason. Casting is a good example of this; what was wrong with `(i8)thing`? What does `as` gain us? Couldn't we have used `as` in a more powerful way, like context managers in Python for example?

(type)expression is annoying to parse: in, say, '(i8)x' there's no way to tell that it's a cast expression until you get to the x. That is, (i8) is a valid expression (if there's a variable called i8), and is something like (x)(y) a cast or a function call?

In any case, destructors and move semantics gives most of the benefits of context managers.

> I'm not wild about `macro!`. Because macros are hygenic, I don't care if something is a macro or not and the `!` makes those calls stick out unnecessarily. I would have preferred that `!` indicated mutation in some way, like in Scheme for example. Or maybe get rid of the `mut` keyword and use `!` in variables. Really anything would be better.

It's a bit of a personal argument, but being explicitly marked highlights where weird things may happen, like side-effects that happen twice or returns out of the current function (a function call itself can't return: only `return`, `?` and macros that use them). But sure, it's something people might not like.

> Semicolons suppressing block expression values is one -- why would you ever want to use a block as an rvalue to only assign `unit` to the lvalue?

This is just consistency. Why have block-as-an-r-value as the only case when a semi-colon isn't allowed? Furthermore, there's places for blocks that are formally r-values, but perfectly legitimately have type (), like the arms of match blocks and bodies of closures. And, lastly, not having a special case like this makes writing macros easier.

> I'm a little on the fence with `match`. I think it strikes the right balance between Python's "use if/else for everything" and C's restrictive/dangerous switch, but most match blocks I see are messy. I think all in all I'm into it, and I like that it's got a history in other languages like OCaml, I just wish I could come up w/ a way to clean it up a little.

Match is necessary for working with `enum`s, and enums are great tool for avoiding allocations (e.g. Option<T> instead of a T* that's possibly null) and generally for guiding towards type safety. Having a single entity that does the complete deconstruction of an enum is important with move semantics, or else one would be forced to do a lot of extraneous nested "as_mut"/"is_none" etc. checking, especially for nested deconstructions.

> match guards make me crazy though; that's exactly what if statements are for. The whole reason I like match/switch more than if/else is that it's restricted to the thing you're matching/switching on. A match guard can run a conditional on anything. It's completely superfluous if your language has if/else.

It's very convenient for conditionalizing based on enums... and yes, it's unrestricted, but that's for consistency: why restrict it?

> You can probably guess I don't like match binding either. First I think that's what `let` is for, but I also think using `@` is both very non-intuitive and a big waste of an operator -- all to avoid a single `let` expression.

Do you don't like binding any variables in a match, or specifically doing it with @? I don't think you can use a let to emulate it, at least not without a lot of clunkiness. In any case, it's rarely used and rarely seen, and yes, probably not worth `@` (a keyword could be better).

> if/while let I really like, but its weird that they don't follow the same rules as regular let expressions, like can `Some(i)` be an lvalue normally (no, it can't). I would prefer that they did follow the same rules, or that they used different syntax than `let` because they are in fact different.

If they followed the same rules, they wouldn't be conditional and there would be no point. One can regard the 'if' and 'while' addition as exactly that difference in syntax: "if" means conditional, so an "if let" is a conditional let. I would think adding more keywords/syntax has a larger downside than the small expansion of let's behaviour, but it's hard to say without actually being able to compare it in practice. The "killer" argument for me is that other languages use the same syntax, so there's no particular reason for Rust to be different here, given it's mostly an aesthetics argument.

> `pub` doesn't really mean "public", it means "visibility" and pub without modifiers means "visibility(public)". I do sort of like `mod`, but I dislike having to nest things inside of it. I would prefer a file-based approach, where you declared the file's module at the top.

Rust does have a file-based approach.

"pub" does mean "public", just more restricted than globally. Which, to be fair, is usually what public means in english. Visibility/access control has been an endless argument in Rust.

> I would prefer `impl <trait> on <struct>` instead of `impl <trait> for <struct>` because `for` already means something else.

'for' means something else in a completely different context. I don't really see the benefit in distinguishing them, but sure, I guess you could rename a keyword.

> I would have liked to use traits as types in function signatures instead of the clunky generic syntax. Alternatively just require the `where` clause. Both options reduce the number of things you have to know.

You'll be excited for some of the "impl trait" stuff.

> I wish the "new type idiom" didn't reuse `struct`. Probably `type` goes there.

It's literally just an idiom built on top of a struct. There's nothing special about it. I take it you want the idiom to be baked into the language to be slightly nicer.

> I don't really like the ceremony around heap allocation. I know there are a lot of benefits to `Box`, but I'd prefer `` to Box<T> and `alloc` or even just `new` to `Box::new`. Or hey, if you're against operator reuse, let's use `@` now that we've tossed match binding. Mainly my complaint is this is unnecessarily different from other systems languages.*

Box is (only very slightly) special: it's more consistent to not preference it over other pointer types. In any case, Box literally used to be "~" and Rc "@". People complained endlessly about the impenetrable sigils.

Also, Box shouldn't be that common in most Rust code.

camgunz · on March 1, 2018

> (type)expression is annoying to parse

Ehhh I think it's not that bad, no harder than arithmetic expression parsing certainly. It's just one more grammar rule. Plus Rust supports everything that's needed already because of operator overloading.

> In any case, destructors and move semantics gives most of the benefits of context managers.

They (well, destructors anyway) are less explicit though. When you use a context manager in Python you know it's cleaning things up. When you "use" a destructor in Rust you usually don't ever know you did. This can get you into trouble if you're relying on RAII to cleanup after you: you can get a handle to things and then enter a long loop or call chain. Sure you can avoid that by calling `drop`, but if context managers were the idiom this kind of thing would never be an issue. But I admit the difference is pretty small -- and in fact might be surprising to systems programmers so probably it's the right choice (even if destructors themselves are kind of mind boggling).

> This is just consistency. Why have block-as-an-r-value as the only case when a semi-colon isn't allowed? Furthermore, there's places for blocks that are formally r-values, but perfectly legitimately have type (), like the arms of match blocks and bodies of closures. And, lastly, not having a special case like this makes writing macros easier.

Well mostly I would deal with all this by removing "everything is an expression". Match blocks would just be regular blocks like `switch` in C or what have you, and so on.

> Match is necessary for working with `enum`s, and enums are great tool for avoiding allocations (e.g. Option<T> instead of a T* that's possibly null) and generally for guiding towards type safety. Having a single entity that does the complete deconstruction of an enum is important with move semantics, or else one would be forced to do a lot of extraneous nested "as_mut"/"is_none" etc. checking, especially for nested deconstructions.

Well switch/if/else have worked fine for a long time; and now C/C++ compilers warn you on missing branches when switching on enumerated values (Rust won't if you have a fallthrough case). But I prefer switch/match to if/else because if/else are too general for just switching on a variable (which is why I'm very dismayed at match guards -- they wholly abrogate the benefit of match over if/else), and I think switch's behavior is generally too restrictive (this is the only problem I have with switch provided you use proper blocks instead of the souped-up goto it really is). Match has so many features baked into it that every time I see one I have to stop and take a deep breath. It's very much geared towards writing and not reading, I feel.

> [match guards are] very convenient for conditionalizing based on enums... and yes, it's unrestricted, but that's for consistency: why restrict it?

I kind of went into this above, but in a language that doesn't have a good match/switch (like Python) you'll frequently run into code like this:

    if value == 1:
        # do a thing
    elif value == 2:
        # do a thing
    elif value in range(3, 20):
        # do a thing
    elif totally_unrelated_function_call() and other_thing == 98:
        # do a thing
    elif value >= 20:
        # do a thing

But when you have switch, you can't run that 4th conditional and have it shortcircuit the 5th conditional. Switch is, in that way, very much specifically for breaking down enums, so when I see one I can restrict my thinking to that variable and that variable alone.

Unless, of course, we're using match guards. Then I have to consider everything again, and I wonder why we're not just using if/else. Of course I understand that match is an expression so that's another "benefit", but if/else also have that behavior in Rust and I would get rid of that anyway.

> Do you don't like binding any variables in a match, or specifically doing it with @? I don't think you can use a let to emulate it, at least not without a lot of clunkiness.

I don't think this is too bad:

    fn main() {
        println!("Tell me type of person you are");

        let my_age: age();

        match my_age {
            0             => println!("I'm not born yet I guess"),
            // Could `match` 1 ... 12 directly but then what age
            // would the child be? Instead, bind to `n` for the
            // sequence of 1 .. 12. Now the age can be reported.
            1  ... 12 => println!("I'm a child of age {:?}", my_age),
            13 ... 19 => println!("I'm a teen of age {:?}", my_age),
            // Nothing bound. Return the result.
            _         => println!("I'm an old person of age {:?}", my_age),
        }

> Rust does have a file-based approach.

Sure but I guess my argument is the file/folder hierarchy approach combined with mod is a little clunky. I think it's clearer to just declare a module at the top and let people use whatever folder structure they want.

> Visibility/access control has been an endless argument in Rust.

Hah, OK fair. I guess you can't please everyone ;)

> I take it you want the idiom to be baked into the language to be slightly nicer.

Yeah like `type` or some such. It's a little weird to kind of overload struct this way (and don't get me going on enum haha -- it's a variant!!!!!!!!!!! it's a union!!!!!!! it's anything other than an enum!!!!!).

> People complained endlessly about the impenetrable sigils.

Oh yeah, I was one of them. But my complaint was all the extra sigils. There was `~` for an owned pointer, `@` for GC pointers, and the `mut` suffix for mutable versions, and the worst sin of all was that `*` was strictly for unsafe pointers. So the one you're most likely to recognize is the one you'll basically never see. Booooo.

But I guess mostly what it comes down to is that "everything is an expression" greatly weirds the language for me, match is a super feature, and there are weird tricks that don't really make sense like "Use _ as the default case in a match" and "if you're destructuring in a match and you don't care about some fields, just use `..`". At least in switch, the default case is called "default".

dbaupp · on March 1, 2018

> Ehhh I think it's not that bad, no harder than arithmetic expression parsing certainly. It's just one more grammar rule. Plus Rust supports everything that's needed already because of operator overloading.

No, it forces you to have a cover grammar for things that could be either a type or an expression, i.e. you need to be able to parse the superset of both possibilities, from when you see the '('. This isn't something Rust needs or has at the moment.

> They (well, destructors anyway) are less explicit though. When you use a context manager in Python you know it's cleaning things up. When you "use" a destructor in Rust you usually don't ever know you did. This can get you into trouble if you're relying on RAII to cleanup after you: you can get a handle to things and then enter a long loop or call chain. Sure you can avoid that by calling `drop`, but if context managers were the idiom this kind of thing would never be an issue. But I admit the difference is pretty small -- and in fact might be surprising to systems programmers so probably it's the right choice (even if destructors themselves are kind of mind boggling).

Yeah, that's why I said "most". :)

It's fair that destructors are implicit, but context managers are used for a lot of bread-and-butter clean-up like `with open(filename) as f:` etc, for which the destructor barely does anything.

Additionally, context managers end up being super "infectious": they're much harder to store, manipulate and return than an object with a destructor. E.g. how do you write a function f that opens a file and returns it for the user to use (maybe it does something tricky to find which file to open, or something)? You'd need coroutines or passing a closure into f to be able to manipulate the file handle while its context was open. I don't think the infrastructure required is the right trade off for a systems language (it seems like it'd end up with a fairly strong compile-time vs. runtime performance trade-off, and Rust's compile times are bad enough as they are).

> Well mostly I would deal with all this by removing "everything is an expression". Match blocks would just be regular blocks like `switch` in C or what have you, and so on.

Then you end up with a pile of ceremony and junk, with 'return's everywhere (some functions would get 50% larger/more noisy, just from the 7 characters "return "), and, as with most things, it makes macros and generating code more annoying. However, just to be clear, these changes you're suggesting are making Rust less consistent.

There are many places where Rust is different to C and C++, but it is also more consistent, which is one of your complaints. (You can see "everything is an expression" style of thinking has benefits even in C: the classic do ... while(0) trick for macros, plus GCC's statement expressions.)

> Well switch/if/else have worked fine for a long time; and now C/C++ compilers warn you on missing branches when switching on enumerated values (Rust won't if you have a fallthrough case). But I prefer switch/match to if/else because if/else are too general for just switching on a variable (which is why I'm very dismayed at match guards -- they wholly abrogate the benefit of match over if/else), and I think switch's behavior is generally too restrictive (this is the only problem I have with switch provided you use proper blocks instead of the souped-up goto it really is). Match has so many features baked into it that every time I see one I have to stop and take a deep breath. It's very much geared towards writing and not reading, I feel.

It's not clear to me how much Rust you know from this sentence: you Rust enums do more than C/C++ ones? Each case can contain data, and match is the only way to get at that data. There's no other way to conditionally deconstruct an enum down into its parts, other than 'if let' and 'while let' but those have the problems of if/else.

Having data is the main motivation for match guards: to conditionalise on things that only make sense for that arm. Matches get matched in order, meaning if a guard fails it falls through to check the next one, this means that guards even on data not from that arm are useful (e.g. maybe an arm only applies in certain cases, in which case other variants should take precedence). However, yes, match guards are rare.

In any case, my experience is almost all matches are simple pattern matching, there's no guards, no @s. It's theoretically possible (and occasionally occurs, sure) that someone writes a ridiculous match using its 3 separate constructs, but that's true of many things? I personally find the most annoying thing is how deep the code of a match ends up being indented.

Lastly, I hate the "X has worked fine" arguments: "mail has worked fine for for a long time, why do we need email". It feels like an intellectual shortcut to cut off discussion: if X has worked fine, it should be easy enough to defend it in comparison to the new thing (which, to be fair, you do :) ). In any case, match recognizes that switch has worked fine, and does what it can do (except fall-through, but pattern-alternation with | covers most of why fall-through is used in practice).

> Then I have to consider everything again, and I wonder why we're not just using if/else

Because it fundamentally doesn't work with enums.

Also, I feel any restriction is pretty pointless because you can always have dummy use of a value:

  fn always<T>(_: &T) -> bool { true }
  match x {
    Enum::Variant(a) if global && always(&a) => { ... }
    ...
  }

It's fair that people usually wouldn't do this, but still, it seems less consistent: it's special-casing the scoping rules for expressions in a match-guard, for somewhat arbitrary "code style" reasons.

> I don't think this is too bad:

That's not the impossible/difficult cases: nested matches are:

  let foo: Option<AnEnum> = ...;
  match foo {
    Some(inner @ AnEnum::Variant(_, _ , _)) =>  { ... }
    _ => { ... }
  }

Doing this with just a 'let' requires matching each contained value of the nested pattern and then reconstructing that whole thing.

However, I agree that @ is barely useful and it's definitely rarely used.

> Sure but I guess my argument is the file/folder hierarchy approach combined with mod is a little clunky. I think it's clearer to just declare a module at the top and let people use whatever folder structure they want.

This is again something that's argued endlessly about (and I think? there's been recent work/proposals to change it).

I personally find the core pub/no-pub + mod + use system (of 1.0, I've lost track of the various additions) is nicely minimal, with those three pieces that fit together quite orthogonally/consistently. This has benefits like ease of navigation and consistent behaviour between projects, rather than the C++ style of a namespace splattered across hundreds of headers.

However, it's definitely true that it is clearly clunky and hard-to-use for a lot of people.

> Oh yeah, I was one of them. But my complaint was all the extra sigils. There was `~` for an owned pointer, `@` for GC pointers, and the `mut` suffix for mutable versions, and the worst sin of all was that `` was strictly for unsafe pointers. So the one you're most likely to recognize is the one you'll basically never see. Booooo.*

So a sigil for Box/owned pointer is okay, but not for any other library-defined types? The current behaviour is consistent: types built deeply into the language (& and &mut are the building blocks of safety, and const/mut are the building blocks of every other pointer; both of which are completely dependency-less: no allocations, etc.) get sigils, and those that are plain library types do not.

While it's fair/a little weird that raw pointers get s, I think it's fairly defensible, for a few reasons: Box is quite rare in Rust (& and &mut are used most often, when pointer-like objects are needed), raw pointers are often used when close to C so there's a sense in which not using would be "being different for no real reason", and `unsafe` code is unpleasant enough as it is to read and write using long verbose types wouldn't help. But I would also think it's defensible to not have raw pointers have sigils.

> Use _ as the default case in a match

NB. this is also consistency: _ is "match any value" in every pattern, whether as the last arm of a match or elsewhere. (And, to be clear/linking to the next point, it's always match any single value.)

> if you're destructuring in a match and you don't care about some fields, just use `..`".

While it's fair that .. is a little impenetrable, what's the alternative? Listing every field? Not writing anything at all, and having no reminder/indication that there's ignored data (and also no help with refactoring when adding fields to the type)?

---

I'm probably seeming kind-of ranty here, but I think a lot of these sort of "Rust isn't consistent/is too complicated" discussions come down to familiarity. Don't get me wrong, it's definitely unfortunate that it's unpleasant to write when one is unfamiliar (would be way better if it was smooth from the start), but there is a core consistency.

I also think it's worth separating out "Rust is complicated" and "Rust is different", although the consequence of the two probably end up being similar in a lot of cases (hard to build an accurate mental model because things are unexpected).

camgunz · on March 1, 2018

> I'm probably seeming kind-of ranty here, but I think a lot of these sort of "Rust isn't consistent/is too complicated" discussions come down to familiarity.

Not at all! I'm honestly really grateful you're engaging. Let me get home and I'll respond fully :)

camgunz · on March 2, 2018

> No, it forces you to have a cover grammar for things that could be either a type or an expression, i.e. you need to be able to parse the superset of both possibilities, from when you see the '('. This isn't something Rust needs or has at the moment.

100% agree, but I think it's probably worth the extra complexity in Rust's implementation to restore familiarity with casting.

I don't know if this is already a concept somewhere (I feel like it has to be) but I think that given a software or information encoding problem, there's a certain base complexity. You might be able to solve that problem in multiple different ways, splitting up the complexity in each one, but the total amount of complexity is still there.

Memory management is a good example. Memory must be managed somehow, and in languages like Python, Java, and even Rust (with lifetimes and (A)Rc) the complexity of that management is in the language/platform implementation whereas in languages like C it's in the application. Regardless, it exists.

So I would prefer that this complexity be in the implementation in order to maintain familiarity with the long history of systems and applications languages. I recognize it's more work for Rust, but as a user of Rust and not a maintainer, I'm OK with that ;)

Re: Context managers, I think we agree here and in fact, basically all I want out of a context manager is Rust's blocks. I guess a block without a statement is a little strange, but actually in Rust it's kind of idiomatic so I can get behind it.

> Then you end up with a pile of ceremony and junk, with 'return's everywhere (some functions would get 50% larger/more noisy, just from the 7 characters "return "), and, as with most things, it makes macros and generating code more annoying.

Woof, I do _not_ consider `return` noisy. Along the same lines as "everything is an expression is weird", I think implicit returns are weird. I guess it's maybe like everything; like when you work in Python you think braces and semicolons are annoying noise, when you work in Java you think manual memory management is annoying noise, and now maybe that goes for `return` in Rust. I'm almost never irritated by "ceremony" -- I'm pretty good at typing. Instead it's the "neat tricks" and inconsistent structure of programs that really eats my time and burns my brain cycles.

I'll admit to not being a huge fan of macros -- especially in systems languages. I think inlining functions is far less surprising, and the fact that you have reduced power compared to macros means there's far less surprising behavior (i.e. "why am I returning early..."). I know they're good for getting rid of "ceremony" but I'm guessing we'll end up disagreeing about how important that is :) But consequently I'm not really willing to give up anything to make macro writing easier.

> However, just to be clear, these changes you're suggesting are making Rust less consistent. There are many places where Rust is different to C and C++, but it is also more consistent, which is one of your complaints. (You can see "everything is an expression" style of thinking has benefits even in C: the classic do ... while(0) trick for macros, plus GCC's statement expressions.)

Haha well, I'm not gonna defend do/while(0). Textual macros and optional braces are obviously (now) not a good idea.

Consistency is fine as long as it's good consistency. Sure C mixes a lot of statements with a few expressions, but that never bothered me because that's practically all mainstream languages. "Everything is an expression" is consistent, sure, but at what cost?

> There's no other way to conditionally deconstruct an enum down into its parts, other than 'if let' and 'while let' but those have the problems of if/else.

I guess really what I want is for there to be a construct that switches between different enums, and there to be a different construct that switches between different values. Conflating the type with the value is confusing to me. Ex:

    fn inspect(thing: ThingEnum) {
        match thing {
            case ThingEnum::ThingOne {
                switch thing.thing_one_field {
                    case 1 {
                        // do ThingOne.thing_one_field == 1
                    }
                }
            }
            // etc.
        }
    }

Yeah it's a little pyramid-y, but hey welcome to matching and variant types. If you really worked things around you wouldn't need to nest so far, but that's probably too much of a syntax change:

    fn inspect(thing: ThingEnum) {
        thing=>variant(ThingEnum::ThingOne) {
            switch thing.thing_one_field {
                case 1 {
                    // do ThingOne.thing_one_field == 1
                }
            }
        }
        // etc.
    }

Anyway there are a lot of benefits. The distinction between switching on type and value is very clear. Blocks and control flow are very clear. It uses previously standard constructs (switch/case). I really don't need to know anything about Rust to know how this works; it is self-evident. Really maybe the only ambiguous thing is "match", which should probably be like "variant" or something, but whatever.

> In any case, my experience is almost all matches are simple pattern matching, there's no guards, no @s. It's theoretically possible (and occasionally occurs, sure) that someone writes a ridiculous match using its 3 separate constructs, but that's true of many things? I personally find the most annoying thing is how deep the code of a match ends up being indented.

100% agree.

> That's not the impossible/difficult cases: nested matches are:

    let foo: Option<AnEnum> = ...;
    match foo {
        Some(inner @ AnEnum::Variant(_, _ , _)) =>  { ... }
        _ => { ... }
    }

While I get what this does, it looks very noisy. Losing the ability to do this in a single construct is fine w/ me if this is the result.

> ...C++ style of a namespace splattered across hundreds of headers.

That's a fair point and worth worrying about, haha. Good call.

> So a sigil for Box/owned pointer is okay, but not for any other library-defined types?

Mostly I just didn't think the (A)Rc pointers needed a sigil, and I thought it was weird that what most people would think of as a pointer used to use `~` and now uses `Box`, whereas the pointer you would (mostly) never use is the one with the most familiar sigil (`* `). Feels like that one should be `Raw` and `Box` should be `* `. Library-defined or otherwise doesn't really matter to me; and if you're building something that can't allocate, the compiler will just tell you when you can't use `alloc`/`new` or whatever. EZ.

I get what you're saying about unsafe code being closer to C though. I guess I would have made regular Rust closer to C and had unsafe be less ergonomic, as an interesting way to discourage people from using it (see Python's prolific use of `__` everywhere), so that's probably the root of our disagreement here.

> While it's fair that .. is a little impenetrable, what's the alternative?

I would get rid of struct destructuring entirely. It's only real use is inside of match, and that's packing more things into match.

> I think a lot of these sort of "Rust isn't consistent/is too complicated" discussions come down to familiarity

Oh I'll definitely cop to being 1000x better at C than I am at Rust, and the more I use it the more I'm fine with it. But moving from C to Rust (or any language to Rust) is so hard because of all of these things. I mostly work in C, Python, Java, and JavaScript and Rust is very different from all of those -- and it's hard for me to justify those differences. And as a C programmer, the borrow checker isn't responsible for the learning curve. Rather, it's all the "neat" things in Rust. All I really wanted was C with a borrow checker, or Java without GC and a 90s idea of OO (hand waving a lot here). I honestly don't see why we had to tack on all this extra stuff.

> I also think it's worth separating out "Rust is complicated" and "Rust is different", although the consequence of the two probably end up being similar in a lot of cases (hard to build an accurate mental model because things are unexpected).

Definitely. Point taken :)

dbaupp · on March 2, 2018

> 100% agree, but I think it's probably worth the extra complexity in Rust's implementation to restore familiarity with casting.

This is framing it a trade-off between writing something complex once versus forcing everyone to handle that papercut. Which, usually, I'd agree with the going with the complex-but-only-once (a question of asymptotics, after all).

However, I'm not sure it's entirely like that in this case: at the very least, this complexity here is revealed to the programmer (they have to do the same parsing switch in their head, even if it's usually fairly obvious). Additionally, this complexity applies, somewhat, to tools that work with code too, not just the compiler (e.g. limited editors trying to do syntax highlighting without running more detailed semantic analysis). This seems like such a minor thing to introduce such a heavy penalty, but maybe there's something more annoying you're finding with 'as'.

> Woof, I do _not_ consider `return` noisy. Along the same lines as "everything is an expression is weird", I think implicit returns are weird. I guess it's maybe like everything; like when you work in Python you think braces and semicolons are annoying noise, when you work in Java you think manual memory management is annoying noise, and now maybe that goes for `return` in Rust. I'm almost never irritated by "ceremony" -- I'm pretty good at typing. Instead it's the "neat tricks" and inconsistent structure of programs that really eats my time and burns my brain cycles.

Sure, I can see that; like any symbols, the brain quickly glazes over keywords like 'return', but it's still a little bit of processing. In any case, implicit returns do feel a bit weird to me at times, but Rust is statically typed, and the returns are never in surprising places (i.e. it's always the last thing in a function/block), which means I don't have to think about it: if it type checks, it's usually what I meant.

I really do think this is just a familiarity thing: coming from languages with a strong statement vs expression distinction it's weird, coming from mathematics/languages with out the distinction, it isn't. The language isn't particularly more complex because of it, it is just slightly different.

It's true that Rust's target market is mostly the former set of languages, so one could argue maintaining familiarity is critical (something Rust acknowledges: {} for scope and <> for generics driven by that), but it's also an argument that would have kept us writing slightly improved assembly languages forever.

> Consistency is fine as long as it's good consistency. Sure C mixes a lot of statements with a few expressions, but that never bothered me because that's practically all mainstream languages. "Everything is an expression" is consistent, sure, but at what cost?

Yes, what cost? I genuinely don't see a cost other requiring some people to get used to it, and I do see costs to the other approach.

I personally hate the C/C++ pattern of having to declare things and then initialize them later. The C++ code I'm currently writing has several places where I've been forced to write things similar to:

  const char *name;
  switch (someEnum) {
  case X: 
    name = "...";
    break;
  case Y:
    name = "...";
    break;
  // ...
  }

I personally find the following to be so much nicer:

  let name = match someEnum {
    X => "...",
    Y => "...",
    // ...
  };

In particular, everything is together, I'm not having to skip over the low-information-density "case", "break" and "name =" to find the interesting bits (the enum variant and the string it corresponds to), plus I'm not having to what reconstruct those two statements are actually trying to do (it's just initializing name).

An additional, although possibly contentious, benefit is this lets type inference work: name didn't need a type. This is more important when the type is long and complicated: type inference lets one use that complicated type without having to write it out, whereas in the declaration version, one might be tempted to go to a simpler type for development ease (or duplicate code, to not have to have 'name' live outside the switch) even if it is slower (e.g. collecting an iterator to a Vec).

(I also forgot to mention ternary ?: in C: it's also partly an acknowledgement that if-as-a-expression is useful.)

> I guess really what I want is for there to be a construct that switches between different enums, and there to be a different construct that switches between different values. Conflating the type with the value is confusing to me. Ex:

At the very least, this makes type checking harder (both for compilers and for people trying to write/understand the code, and compiler error messages): the 'thing' variable doesn't have type 'ThingEnum', it has a changing type that starts as 'ThingEnum' but switches to some restriction of that in a branch.

This opens a whole can of worms about wanting `if x is ThingEnum::ThingOne` to also restrict the type or `assert!(x is ThingEnum::ThingOne)`, and then wanting this type restriction thing to be more first class (e.g. abstracting it away behind functions, like `x.is_some()` succeeding "setting" x's type to Option::Some so that one can access the contained value).

This all sounds great and useful! And, it practically all already exists naturally with Rust's (and that of most other languages with similar enums[1]) current model: the enum variant itself is a first class way to reason about the various variants, and extracting the data (or at least, binding the thing to a new variable) at the point you find out the variant means there's never a worry about making sure the compiler understands all the ways in which to decide that an enum value is actually a specific variant.

I don't see much upside to emulating C-style manual tagged unions here.

[1]: This is a point where other people would complain about breaking with other practice for no good reason too (anyone who had used Haskell or OCaml or similar would find this system unnecessarily clunky).

Focusing on a single value also doesn't generalize/scale: for instance, if one is making a decision that depends on more than one value:

  fn maybeAdd(x: Option<i32>, y: Option<i32>) -> Option<i32> {

    match (x, y) {
      (Some(left), Some(right)) => Some(left + right),
      (Some(left), None)        => Some(left),
      (None, Some(right))       => Some(right),
      (None, None)              => None
    }

  }

(One could, theoretically, merge the two Some/None lines, with an | pattern, I like it like the above, since the RHS is so simple. The last 3 lines of this specific example could be reduced to "(left, None) => left, (None, right) => right", but that doesn't apply generally so I didn't do it for this one.)

Under a separated scheme (plus returns) the match looks like:

    match x {
      Some => match y {
        Some => Some(x.value + y.value),
        None => Some(x.value),
      }
      None => match y {
        Some => Some(y.value),
        None => None,
      }
    }

Similar to the "name" example above, I find this doesn't clearly express how I think about code, and I have to reverse engineer it: I don't want to follow the tree to see "if x is some and y is some then add, otherwise [y is none] so return x, ...". That's how a computer thinks, but that's not how I want to think about my code in almost all cases: usually I want to know the task the code is doing, not the details of every little step (the latter only in cases of micro-optimisation of a tight loop/hot function: if necessary, I can still write the Rust in that form). The Rust more declaratively expresses "if they both exist, add the values, otherwise if one of them exists, return that, otherwise return nothing".

Also, slightly related, but there has been semi-regular discussions around allowing an individual enum variant to be treated as a struct, essentially, so that instead of needing to define a whole new struct for any variant that might need to be manipulated on its own, one can refer to the variant directly:

  struct Foo { x: i32, ... }
  enum Bar { 
    Foo(Foo),
    ...
  }
  enum Baz {
    Foo { x: i32, ... } 
    ...
  }

Then Baz::Foo would be a struct-like type. (However, any proposal here would still require a match and new variable bindings and so on, because that avoids all the problems of having things have variable types.)

(Continued...)

dbaupp · on March 2, 2018

(...)

> While I get what this does, it looks very noisy. Losing the ability to do this in a single construct is fine w/ me if this is the result.

FWIW, I agree, just demonstrating that replacing @ in patterns isn't the easy case as it is with the classic integer range example you gave. :)

> Mostly I just didn't think the (A)Rc pointers needed a sigil, and I thought it was weird that what most people would think of as a pointer used to use `~` and now uses `Box`, whereas the pointer you would (mostly) never use is the one with the most familiar sigil (` `). Feels like that one should be `Raw` and `Box` should be `* `. Library-defined or otherwise doesn't really matter to me; and if you're building something that can't allocate, the compiler will just tell you when you can't use `alloc`/`new` or whatever. EZ.*

The one I think of as a pointer is & and &mut. And, there's a whole pile of reasons that Box doesn't get used nearly as much as * and malloc in C:

- arrays and non-owning pointers use different syntax (Vec/[] and &/&mut respectively)

- proper generics means much fewer places where one needs to create a void* to pass data around (e.g. std::thread::spawn vs. pthread_create)

- enums means polymorphism and optional-ness can be done with "inline" types with little ceremony.

But yes, as I said, it is a little weird that the dangerous pointers gets relatively nice syntax.

> I would get rid of struct destructuring entirely. It's only real use is inside of match, and that's packing more things into match.

If you're including tuples as structs, I strongly disagree: it's great for `let` and multiple returns (and even true structs, not tuples, are sometimes nice there, although 'foo'/'bar' has less benefit versus a struct's 'x.foo'/'x.bar' than versus a tuple's 'fooAndBar.0'/'fooAndBar.1'). Also, .. works for struct enum variants: the Baz::Foo variant above is valid Rust syntax now.

> Oh I'll definitely cop to being 1000x better at C than I am at Rust, and the more I use it the more I'm fine with it. But moving from C to Rust (or any language to Rust) is so hard because of all of these things. I mostly work in C, Python, Java, and JavaScript and Rust is very different from all of those -- and it's hard for me to justify those differences. And as a C programmer, the borrow checker isn't responsible for the learning curve. Rather, it's all the "neat" things in Rust. All I really wanted was C with a borrow checker, or Java without GC and a 90s idea of OO (hand waving a lot here). I honestly don't see why we had to tack on all this extra stuff.

Yeah, it's definitely true that new things have a "change budget": how different they can be before it's too much. Rust has been designed with this in mind (things like {} and <>, as I mentioned above), but different people's threshold for it are different. There's lots of changes in Rust over C that make code more declarative and require less mental reconstruction (for me), which means I enjoy writing Rust more than if it didn't have them, but it's definitely true that they aren't literally necessary, or that they could maybe be phrased in more restricted ways that match closer to C.

I've personally found being exposed to many very different languages has helped inform the code I write in all others. Each new one definitely takes some getting used to, but my experience is that having touched several different paradigms has made me both more flexible (less attached to any particularly way of doing things) and a deeper understanding of even the "boring" languages: the trade-offs and "whys". There's value to things being the same, but there's also value in being able to break out of that mold and doing new things even if there's not an obvious benefit when focused on the old style.

I hope that you find Rust more and more enjoyable as you use it more, and if not, that's unfortunate: not everything works for everyone. Hopefully, at the very least, Rust inspires other languages that suit you better. :)

trishume · on March 6, 2018

Just chiming in that I really like your absurdly long HN comments about Rust, and that while I have a script that gives me an RSS feed of them, others don't.

You should consider a blog where you just copy-paste long HN comment threads about Rust you find yourself writing.

dbaupp · on March 13, 2018

Thanks for the kind words! I do have a blog with various writings about Rust (link in my profile), but I don't think publishing comments would be appropriate for me right now (at least, not without more work). :)

camgunz · on March 6, 2018

> However, I'm not sure it's entirely like that in this case: at the very least, this complexity here is revealed to the programmer (they have to do the same parsing switch in their head, even if it's usually fairly obvious). Additionally, this complexity applies, somewhat, to tools that work with code too, not just the compiler (e.g. limited editors trying to do syntax highlighting without running more detailed semantic analysis). This seems like such a minor thing to introduce such a heavy penalty, but maybe there's something more annoying you're finding with 'as'.

To be honest, I really think you're blowing the complexity of this way out of proportion. Plus like, it's not like Rust's grammar is this incredible thing of beauty: look at `where` clauses or lifetime syntax. If Rust were really optimizing for human readability, it's hard for me to imagine this is the result. It's purely a style thing, just like `let` instead of `var` and so on, and I don't think sacrificing programmer familiarity for a designer's idea of style is a good tradeoff.

> I really do think this is just a familiarity thing: coming from languages with a strong statement vs expression distinction it's weird, coming from mathematics/languages with out the distinction, it isn't. The language isn't particularly more complex because of it, it is just slightly different.

I disagree; I think the language is significantly more complex as a result. "Everything is an expression" leads to a lot of assigning out of `if` and `match` expressions, consequently there's a ton of pressure to dump a lot of things into them. `match` in particular is really just a regex for variables, except 100x bigger. In most ways I consider that a regression.

> It's true that Rust's target market is mostly the former set of languages, so one could argue maintaining familiarity is critical (something Rust acknowledges: {} for scope and <> for generics driven by that), but it's also an argument that would have kept us writing slightly improved assembly languages forever.

I don't think that C/C++ are being held back because they have statements though. My whole point is that Rust fixes the main issues with C and C++ (weird tricky behavior, super dangerous memory management, data races, concurrency, no/bad standard library), but then for considerably less gain tacks on a lot of functional programming ideas.

To be clear I'm not at all against functional programming. I just think most systems programmers aren't functional programmers because there really haven't been (and still aren't, as Rust isn't really functional) functional systems languages. And because Rust doesn't have the benefits of a lot of functional languages (super cool Lisp macros, etc.) I struggle to see the point beyond a style preference.

> I personally hate the C/C++ pattern of having to declare things and then initialize them later.

Like this for example. Do you hate it enough to pull all the statements out of a language? Feels like time that could be better spent somewhere else.

> Yes, what cost? I genuinely don't see a cost other requiring some people to get used to it, and I do see costs to the other approach.

The vast, vast majority of programmers come from languages with statements, and one of the main complaints about Rust is its learning curve. I think that's a significant cost.

Like, when I build an application in Java, Python, or C, "I had to initialize a variable using a function instead of an if/match statement" is nowhere on the pitfall list. That's not what our industry struggles with. It struggles with program complexity, logic errors, concurrency, and architectural confusion. "Everything is an expression" does nothing to address those issues. Lifetimes and the borrow checker do, and I think those departures are great investments for systems programming. Assigning from a match solves a problem I never had.

> I personally find the following to be so much nicer:

  let name = match someEnum {
    X => "...",
    Y => "...",
    // ...
  };

Really the only reason I use `switch` so much in C is that there's not a lot of polymorphism. Otherwise I greatly prefer to have this logic internal to the thing I'm matching on:

  let name = someEnum.get_name();

But again, because `match` is the way Rust does everything, look it's another match expression.

> (I also forgot to mention ternary ?: in C: it's also partly an acknowledgement that if-as-a-expression is useful.)

Aha well, as you might imagine I really dislike the ternary -- in really any language. They're hard to read, hard to edit, too easy to make a mess... basically all my arguments about `match`. Honestly just use an if.

I don't want to keep repeating myself, but my main beef isn't readability or whatever. It's that I don't understand what it gets me beyond using if. If ternaries somehow (really) solved a NULL problem, or helped me with error handling, or let me avoid use-after-free, then those are all great and welcome tradeoffs. But all it does is save a couple of lines, and because my problem has never been that I had to use a couple extra lines I just don't care about features where that's the sole virtue. I need more.

> I don't see much upside to emulating C-style manual tagged unions here.

Well I guess my point is mostly that I want to differentiate between matching on a value and implementing inside-out polymorphism. I don't like that they're smashed together in match because I think they're very different, and it leads to a lot of mixed up logic inside match expressions. The point isn't to emulate tagged unions so much as it is to separate concerns.

> [1]: This is a point where other people would complain about breaking with other practice for no good reason too (anyone who had used Haskell or OCaml or similar would find this system unnecessarily clunky).

And I would totally agree with them if Rust were a language in the vein of Haskell or OCaml, but it's not; it's a mainstream systems language. If Rust's pitch were, "Hey, do you like Haskell? You'll _LOVE_ Rust!" then they'd have a valid complaint. But Rust's pitch is, more or less, "Hey are you tired of C/C++ pitfalls or slow Python/Ruby/JS code? Give Rust a whirl!" When optimizing for familiarity and shallow learning curve you gotta pick an audience; you can't have it both ways.

> Focusing on a single value also doesn't generalize/scale: for instance, if one is making a decision that depends on more than one value:

  fn maybeAdd(x: Option<i32>, y: Option<i32>) -> Option<i32> {

    match (x, y) {
      (Some(left), Some(right)) => Some(left + right),
      (Some(left), None)        => Some(left),
      (None, Some(right))       => Some(right),
      (None, None)              => None
    }

  }

Oooooh, I think this is a very good point, but honestly this is just half-hearted polymorphism. These two enums should be in a struct, and this logic should be in its implementation. Then I think it's fine to do something like this:

  def maybe_add(self):
      if self.left and self.right:
          return self.left + self.right
      if self.left:
          return self.left
      if self.right:
          return self.right

I mean, I don't want to pick apart a clear hypothetical example. My point is that we already have a tool that can handle those scaling concerns: if and encapsulation/polymorphism. And they're much, much better than match because they scale to more than 3-4 variables; past that and match is just terrifying.

> I've personally found being exposed to many very different languages has helped inform the code I write in all others. Each new one definitely takes some getting used to, but my experience is that having touched several different paradigms has made me both more flexible (less attached to any particularly way of doing things) and a deeper understanding of even the "boring" languages: the trade-offs and "whys". There's value to things being the same, but there's also value in being able to break out of that mold and doing new things even if there's not an obvious benefit when focused on the old style.

Hah, tell me about it! You should've seen me the day I discovered there were different "types" of numbers in C (coming from Python)! There's a lot to learn, no question.

But there's a reason behind C's proliferation of numeric types, one that aligns with its purpose. I don't dislike `match`, etc. because I'd never worked with it before. I dislike it because it encourages so many bad practices in order to solve a problem I never had.

> I hope that you find Rust more and more enjoyable as you use it more

I do actually. Maybe you're getting this, but I identify more as a grump when it comes to programming so I gravitate towards grumpier languages (C, Go, etc.) so Rust's eagerness about some of the ML stuff is a little off-putting. But really, compared to integer promotion in C like, `match` is nothing :)

dbaupp · on March 13, 2018

> To be honest, I really think you're blowing the complexity of this way out of proportion. Plus like, it's not like Rust's grammar is this incredible thing of beauty: look at `where` clauses or lifetime syntax. If Rust were really optimizing for human readability, it's hard for me to imagine this is the result. It's purely a style thing, just like `let` instead of `var` and so on, and I don't think sacrificing programmer familiarity for a designer's idea of style is a good tradeoff.

Having a nice grammar in this respect is more of a technical beauty and elegance than an aesthetic one, but it translates into simpler tooling and so on, which can be an aesthetic one. To be honest, I also think you're blowing the value of having "(Type)value" syntax way out of proportion. It's not like Python or JavaScript use it, and everyone seems to cope fine, and even C++ (theoretically) prefers the far more clunky static_cast<T>(...).

> Otherwise I greatly prefer to have this logic internal to the thing I'm matching on:

Oh, yeah, obviously abstracting out into functions for common functionality is better than ad-hoc matches, but there's lots of cases where that's just overhead and fiddly and less clear. People already complain a lot about Rust requiring needless ceremony and being unergonomic, and encouraging people to go through the ceremony of defining functions for little things seems to be going against that.

> Do you hate it enough to pull all the statements out of a language?

Rust has statements: match, if and the loops can be used as statements, and you're actually free to write your assignments C/C++ style (type inference even works for it):

  let name;
  match someEnum {
    X => name = "...",
    Y => name = "...",
    ...
  }
  doSomething(name)

Moving the "name =" out of the match seems like a tiny step that makes the code less noisy and nicer. But, it is a style preference.

Rust has not pulled statements out of the language, just upgraded things from "always a statement" to "can be an expression".

> I don't want to keep repeating myself, but my main beef isn't readability or whatever. It's that I don't understand what it gets me beyond using if. If ternaries somehow (really) solved a NULL problem, or helped me with error handling, or let me avoid use-after-free, then those are all great and welcome tradeoffs. But all it does is save a couple of lines, and because my problem has never been that I had to use a couple extra lines I just don't care about features where that's the sole virtue. I need more.

The value of 'match'es for things where 'switch' or 'if' are reasonable is being declarative and uniformity with the cases where 'switch' and 'if' are not reasonable. It's fair that it's unfortunate that things can be misused to create confusing code, but I don't think that's a reason to remove something in and of itself, or else a language would have to be extremely small.

> Oooooh, I think this is a very good point, but honestly this is just half-hearted polymorphism. These two enums should be in a struct, and this logic should be in its implementation. Then I think it's fine to do something like this:

What do you mean by polymorphism?! Just being able to have "None" or a value in the same variable as in Python?

Contesting every example by saying that it should just be wrapped up into a struct/function is... kinda missing the point. You still end up with that code somewhere (although it's true for the name example being in a function means 'return' works), and, you end up with a ridiculous number of near-pointless functions and types. In any case, there's not nearly enough context to say that that example should be wrapped up into a type: what's the connection between the two values other than that they should be added? There's a reason no-one proposes writing `a + b * c` as

  (Add {
    left: a, 
    right: (Multiply { left: b, right: c }).doIt()
  }).doIt()

And this random example is only one step above plain arithmetic in terms of abstraction.

In any case, your proposed variant is... not a good version of my code. There's nothing stopping `if self.left: return self.right` (oops!) and it fundamentally doesn't fit with Rust's approach to conversions between types. As I said earlier, adding the type system features to defend against the first thing ends up, in the limit, being more complex than (but pretty similar to) having 'match' and using the existing type infrastructure. This is relevant to Rust's goals, both in being a more reliable systems programming language in general, and also for safety: with move-only types and stronger references, guaranteeing safety but still being useable I think would mean having a lot of that complicated infrastructure.

> And they're much, much better than match because they scale to more than 3-4 variables; past that and match is just terrifying.

I strongly disagree that 'if' scales to more than 3-4 variables, and that 'match' scales worse.

'match'-without-'if' is more restricted than 'if' and so is easier to understand (understand at a high-level): if I see a match on some variables, I know that it's going to be looking at those variables immutably, and structurally. The state of the variables at the start of the match completely determines which code runs. (And, but I guess you probably disagree with this, even with 'if's on arms, there's still not less structure than a plain sequence of 'if's. And, this lack of structure is clearly flagged, whereas with an 'if' chain, everything looks the same.)

But, with an arbitrary sequence of "if"s, it's a free-for-all, anything could happen up to and including mutation of things queried later, there's no static checking that I'm interacting with the variables I want to, and there's no single source of truth for what those variables even are (like the ... in match ... {), and there's no checking for things like handling all the cases. "if" having more power and so being worse is exactly the same reason that "goto" is frowned upon: it is too flexible and so too hard to understand. The same reasoning can be seen in C's 'switch' versus "if (x == Value) ... else if (x == OtherValue) ...".

It's true that an sequence of 'if's with lots of variables doesn't look particularly different to one with only a few variables (unlike Rust's match), but this is deceptive: it's going to at least as hard to understand what that if/else if chain is actually doing, not what it seems like it is doing, even in idealised cases (and, for fairness, if you're thinking the worst case of 'match' with @s and ifs, one really should be thinking of the worst cases of ifs).

---

Anyway, wrapping up this thread, you've convinced me that Rust could be a little more minimal ('match' doesn't need @ or 'if', and the convenience of everything-is-an-expression isn't needed), but I don't think there's even close to "50x" space for simplification.

There's a lot of consistency between various parts without many clunky interactions, which, I find, is where the most annoying complexity in programming languages appears. A lot of different to a major component of the audience, but a lot of that difference is bringing in conveniences from the last few decades of programming language research/experimentation.

It might be an interesting experiment for you to take a moderately large Rust program and convert it into "MISRust" (ala MISRA C, or maybe "misfit Rust" :P ), that doesn't use the C statements as expressions anywhere and just does single level 'match's without ifs or @s everywhere (etc.) just to see what it looks like. I suspect it wouldn't even be too hard to write a clippy-style lint that enforces all those rules.

smitherfield · on Feb 24, 2018

> Eh, I disagree...

The reason transmute is (particularly) unsafe is because it can be quite difficult to tell whether a variable is bound to a value, a (possibly implicitly-dereferenced) reference or a slice, and so, given transmute's implicit type inference, it's very easy to quietly get either the source or destination type very wrong.

The size-checking is also I would argue useless bordering on worse than useless (false sense of security) since any two references or two slices are the same size, but are unlikely to actually be interconvertible, and it also fails to check alignment.

C and C++ (particularly the latter) make it much more difficult to inadvertently cast between values and pointers, and the strict aliasing rules (aliasing almost always illegal except through `char`) are draconian enough to discourage the practice of type-punning-through-indirection (and ensuing alignment bugs) altogether.

`memcpy` (and unions-as-implemented) is nice because it's explicit and it just works.

> I'm curious if you've got a defect report link or similar

I'm having a hard time finding the detailed original paper pointing it out, but see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p059...

dbaupp · on Feb 24, 2018

> The reason transmute is (particularly) unsafe is because it can be quite difficult to tell whether a variable is bound to a value, a (possibly implicitly-dereferenced) reference or a slice, and so, given transmute's implicit type inference, it's very easy to quietly get either the source or destination type very wrong.

Ah, so just the type inference I mentioned? I agree, and certainly try to never let transmute infer when I use it.

However, I don't think one can end up with implicit dereferences or wildly unexpected types since there always has to be some context: the return value of transmute is completely unconstrained, meaning there needs to be something that suggests the type.

Additionally, C++ can suffer in a similar way (but not quite identical) due to auto:

    auto source = ...;
    T dest;
    memcpy(&dest, &source, sizeof source);

Or, in extreme cases: `auto dest = ...;`.

> The size-checking is also I would argue useless bordering on worse than useless (false sense of security) since any two references or two slices are the same size, but are unlikely to actually be interconvertible, and it also fails to check alignment.

... C-style memcpy is strictly worse than all of this.

But yes, maybe a false sense of security. But that's a bit like arguing that a safety guard on a buzz-saw doesn't stop all problems, and so is pointless.

I agree that failing to check alignment of pointer destinations is unfortunate, but it's only one of many many problems that can occur with `unsafe` and even `transmute` itself.

However, it's worth nothing that differing alignment between values being transmuted is not a problem: transmute::<[u8; 4], u32>(byte_array) is fine, despite the byte array only having alignment 1.

> C and C++ (particularly the latter) make it much more difficult to inadvertently cast between values and pointers, and the strict aliasing rules (aliasing almost always illegal except through `char`) are draconian enough to discourage the practice of type-punning-through-indirection (and ensuing alignment bugs) altogether.

I don't agree:

- there's little difference between C/C++ and Rust other than the inference thing. C++ has C-style casts, and reinterpret_cast.

- strict aliasing ends up being commonly violated for transmute-style casts (it's just so temptingly easy, and there's nothing that actually stops it compiling), and `unsafe` in Rust is also a fairly major discouragement to doing bad things.

> `memcpy` (and unions-as-implemented) is nice because it's explicit and it just works.

transmute is only fractionally less explicit (and it's well known bad practice to let it be completely implicit), and also just works.

> I'm having a hard time finding the detailed original paper pointing it out, but see ...

Thanks; interesting!

kazinator · on Feb 23, 2018

"safe" means more than just memory safety.

Free from deadlocks, for one thing; who cares if no memory is misused if the show locks up.

Also, free from problems like thread A only traversing half the list because B removed a node in the middle which derailed A into a null pointer that looked like the list terminator (Even though no memory was misused.)

amelius · on Feb 23, 2018

Also interesting is to figure out how Rust deals with closures, which reference the parent scopes, and how the corresponding (cyclic) data structures are managed.

Does Rust eliminate the cycles by copying? (expensive, and doesn't allow for writing)

dbaupp · on Feb 23, 2018

A closure in Rust is just a struct containing its captures either by value or as & or &mut references pointing to the values. Thus, making/avoiding a cycle with them is the same as doing it for "normal" values.

One can't accidentally get a reference-counting cycle (values aren't reference counted unless they're put into a pointer explicitly), and a cycle of shared/mutable references won't stop deallocation (and, if it's a cycle that's unsafe, it won't compile).

Retra · on Feb 23, 2018

Why would there be cycles? Rust captures closure environments however you like. You can copy them, share them, reference them, whatever.

kazinator · on Feb 23, 2018

Why would be there be cycles when you have closures?

Because a function's environment can end up having a reference back to the same function.

This can be set up without assignment, given just a lambda operator. Hint: look in domain name in the URL in the browser address bar.

viraptor · on Feb 23, 2018

If you used it in a way that allows the closure to escape the scope, you'd get one of the "foo can't outlive bar in score ...". (On a mobile, can't write an example easily)

kazinator · on Feb 23, 2018

If I can't have escaping closures, I want to be working in C.

viraptor · on Feb 23, 2018

You can use unsafe if you want to feel like working with C :-) There's nothing stopping you from doing things you know are correct. You just sometimes need to tell the compiler that you know better and guarantee that the code is fine.

bombela · on Feb 23, 2018

you can have "escaping" closure. But you cannot return something that outlives it's lifetime. So the closure must own data, not merely refer to it. The online Rust book should habe the details.

pjmlp · on Feb 23, 2018

A good example are event handlers on GUI widgets.

nicoburns · on Feb 23, 2018

It tracks the individual variables in the scope that the closure captures. This shouldn't introsuce cycles in general, no?

kazinator · on Feb 23, 2018

An individual variable, as such, can be a function value, such that the variable is visible in that function's environment.

gsg · on Feb 23, 2018

Mutually recursive functions can be closure converted without cycles by deriving closure values from each other rather than storing them in each other.

kazinator · on Feb 23, 2018

Environment copying ruses are revealed when something mutates a lexical variable, and the mutation doesn't appear everywhere as it should. (So you have to ban that.)

gsg · on Feb 23, 2018

First, deriving is just pointer arithmetic and doesn't copy anything. Second, standard flat closure representations already involve copying parts of environments, with any sharing problems addressed by assignment conversion (turning variables that are assigned to into mutable cells, a reference to which can be copied into however many environments is necessary).

kazinator · on Feb 24, 2018

> standard flat closure representations already involve copying parts of environments, with any sharing problems addressed by assignment conversion (turning variables that are assigned to into mutable cells)

In fact, the simple assoc list representation of environments (whereby simple consing extends the environment) does this. It doesn't eliminate circularity.

If a function has a certain binding in scope, and that binding refers back to the function, you can shuffle that binding around between different environment vectors all you want. Wherever you stick that binding, as long as the binding is in scope of that function, you have circularity.

GreaterFool · on Feb 23, 2018

You don't have to go as far as doubly-linked list. Writing a simple cons-list is hard enough:

    enum List<T> {
        Nil,
        Cons(T, Box<List<T>>)
    }

Imagine you're writing `Iterator`. You have a `&mut List<T>`. For `Nil`, you're done. For `Cons`, you take it apart, return the `T`, deref the `Box` and move your `&mut List<T>` to point to that value. Nothing could be easier, right?

Except in Rust you can't do that! One can resort to unsafe code or use ugly and inefficient workarounds to remain in safe-land.

anaphylactic · on Feb 23, 2018

I don't understand what your concern is - this only took me about a minute to write and it looks completely safe and efficient.

https://play.rust-lang.org/?gist=674f4b88876614f603fd70368cb...

GreaterFool · on Feb 25, 2018

Thanks for this snippet. Didn't think about that.

If I understand correctly that's overly restrictive though. You're limiting the lifetime of list elements to the lifetime of the spine of the list.

What I want is this:

    enum List<T> {
        Nil,
        Cons(T, Box<List<T>>)
    }

    struct IntoIter<T>(List<T>);

    impl<T> IntoIterator for List<T> {
        type Item = T;
        type IntoIter = IntoIter<T>;
        fn into_iter(self) -> Self::IntoIter {
            IntoIter(self)
        }
    }

    impl<T> Iterator for IntoIter<T> {
        type Item = T;
        fn next(&mut self) -> Option<T> {
            match std::mem::replace(&mut self.0, List::Nil) {
                List::Nil => None,
                List::Cons(x, l) => {
                    std::mem::replace(&mut self.0, *l);
                    Some(x)
                }
            }
        }
    }

But without the `replace` calls.

Also, in general I may be working with a data type for which I don't have a value I can conjure out of thin air (like `Nil`). What then?

jokoon · on Feb 23, 2018

I don't understand why there is a need for linked lists. I'm reading about fast insert in the middle, but there are other ways to insert data quickly. Maybe it's a need on hardware with specific memory management?

There are so many drawbacks to linked lists: cache incoherence, the use of pointers, no fast random access...

The single fact that rust makes it hard to implement a linked list should show that this data structure is a bad idea. Even when the C++ author is saying it, that should be enough, no?

amelius · on Feb 23, 2018

Firefox is written in Rust, and I suspect that their DOM implementation has backpointers (from children back to parents), for performance reasons. It might be interesting to check how they did it.

pcwalton · on Feb 23, 2018

This is a good question, even if the details are wrong (the question should be about Servo, not Firefox/Gecko). The answer is somewhat idiosyncratic: Servo uses the SpiderMonkey garbage collector to manage DOM objects, which, like all tracing GCs, can deal with cycles just fine.

This ends up simultaneously solving the ever-annoying problem of "how do you manage memory when both JS and Rust can hold strong references to objects?" (In Servo's case, the answer is simply "just punt all of the logic to the JS engine.")

viraptor · on Feb 23, 2018

Firefox is not written in rust. It has multiple small components which were rewritten in rust recently, but it's far from the whole system. Their DOM is still C++

dmitrygr · on Feb 23, 2018

  struct node{
    uint64_t val;
    struct node *next;
    struct node *prev;
  };

:)