Hacker News new | past | comments | ask | show | jobs | submit login
Rust Performance Pitfalls (llogiq.github.io)
424 points by blacksmythe on June 8, 2017 | hide | past | web | favorite | 109 comments



> To get rid of the checks, we can either use bytes directly (usually via Vec<u8> / &[u8]) or, if we are absolutely sure the input will be valid UTF-8, use str::from_utf8_unchecked(_) (note that this will require unsafe and break your code in surprising ways should the input not be valid UTF-8).

I believe this needs a stronger warning. Functions that operate on strings are allowed to assume that their input is valid UTF-8 and may perform out-of-bounds memory accesses if the input is not valid UTF-8. Therefore, creating a string containing invalid UTF-8 can lead to a memory safety vulnerability. The suggestion to use this function seems out-of-place in this article, which otherwise avoids suggesting unsafe code (e.g. indexing arrays without bounds checking).


What sort of language beyond "absolutely sure", "require unsafe", and "break your code in surprising ways" would you recommend to make this warning stronger?


Explain what might happen, as I did in my comment. People are more likely to take heed of a warning if it's concrete.


"requires unsafe" already implies everything in your comment. `unsafe` literally means "this might violate memory safety"


No, I don't believe it does. There are perfectly safe operations that must be done in unsafe, because the compiler is not smart enough to determine they are safe. Thus when describing a potential operation, saying it "requires unsafe" does not imply it can blow up in your face, just that the compiler is not smart enough to determine that at this time.

This is exactly why warnings should strive to be as clear as possible. Just because you think it's not ambiguous, doesn't mean you aren't just missing some context that someone else might have that makes the statement somewhat ambiguous.


> No, I don't believe it does.

This is mistaken, and an impression that we continually strive very hard to counter. The `unsafe` keyword is to be used in the process of writing (and therefore consuming) APIs if and only if those APIs have external unenforced invariants which, if broken, could cause memory unsafety. The reason that we strive to reinforce this so fervently is precisely because we want people to see the `unsafe` keyword and instantly become wary of memory safety violations (and undefined behavior in general); conversely, we want people to be able to view the absence of `unsafe` in code that they have written and have confidence that that code is memory-safe.

In particular, this means that `unsafe` is not to be used for operations that may be dangerous but that have nothing to do with memory safety. If a misused API could delete the production database, that's not unsafe. If a misused API could leak all your users' passwords, that's not unsafe. We had this argument before 1.0 regarding a few things in the stdlib that are more-or-less "dangerous" (e.g. `std::mem::forget`) that cannot be used alone to cause memory errors and arrived at the current strict interpretation deliberately.

Though, of course this requires social pressure to enforce (which is exactly what I'm doing here), as by definition unsafe code is something that the compiler itself cannot reason about.


> > No, I don't believe it does.

> This is mistaken, and an impression that we continually strive very hard to counter.

> In particular, this means that `unsafe` is not to be used for operations that may be dangerous but that have nothing to do with memory safety.

That's not what I'm talking about. I think you've misinterpreted my point. I'm talking about unsafe for memory access, but in ways that are provably (or it not provable, are accepted as) safe. For example, the standard libs that are unsafe under the covers, but expose a safe API.

Saying "you must use unsafe to accomplish this" is not equivalent to saying "this can cause memory unsafety". That latter is a subset of the former, and the only time that isn't true is if Rust is capable of completely identifying every case of safe memory access and only requiring unsafe for actual unsafe operations.

Since unsafe could be required for what is an entirely safe, and possibly provably so, set of actions, it's dependent on what the context the recommendation is whether you believe someone stating an action requires unsafe implies that actual problems could occur.


We must be talking past each other, because I'm still not certain what you're trying to say.

If you're trying to say "there exists Rust code that contains `unsafe` blocks that is memory-safe", then this is obviously (hopefully!) correct, because 100% of the time we hope that our `unsafe` blocks are correctly implemented. In the same sense that "valid" C code does not contain undefined behavior, "valid" Rust code doesn't either; therefore the correctness of `unsafe` blocks in Rust must be tautologically (if uselessly) guaranteed. But that's not a very useful statement. Bugs happen.

Conversely, if you're trying to say "there exists code that can only be written in `unsafe` blocks but that can never cause memory unsafety despite any modification to the surrounding code", then I'd say this is trivially refutable; I can modify any unsafe block to exhibit memory unsafety.

Finally, if you're just saying "static analysis must, by its very definition, reject some correct programs in order to guarantee that the programs it does accept are correct, and Rust's static analyses are no different" then, of course, this is true (again, by the nature of static analysis), but again this is not an especially useful statement, especially since this isn't what anyone here is disputing. Your original comment was made in reply to this statement by rabidferret: "`unsafe` literally means "this might violate memory safety". This is a true statement, both in a social context (see my original comment) and in an implementation context (see the second example from this comment).

The bottom line is, if you see an `unsafe` block, assume memory safety is at risk unless you're absolutely sure the author knows what they're doing.


> If you're trying to say "there exists Rust code that contains `unsafe` blocks that is memory-safe", then this is obviously (hopefully!) correct, because 100% of the time we hope that our `unsafe` blocks are correctly implemented.

Yes, and additionally there are some algorithms that are safe, but to implement require unsafe blocks. I think it's obvious that there are patterns of memory access that are safe, possibly provably so, but not by the compiler at this time.

Since a recommendation for someone to use unsafe could be either a general recommendation with a very broad caveat that quite a bit of additional care and thought, or a fairly benign recommendation to "do X, Y, and Z in this order, and while it requires unsafe, it's no less safe than when we do the same in C/C++", I think it's important to distinguish those, lest someone mistake the former for the latter.

What that boils down to in practice is that stating something requires unsafe is not sufficient by itself as a warning to denote the level of care someone should take. Different people will interpret that statement differently at different times on different topics. There is no need to leave that ambiguity standing when it is easy to clarify. That's all I was trying to express, in response to '"requires unsafe" already implies everything in your comment.'


> Yes, and additionally there are some algorithms that are safe, but to implement require unsafe blocks.

I believe kibwen's point is that every algorithm is safe (when implemented correctly), since any memory safety violations inside `unsafe` blocks are incorrect. The keyword indicates the compiler can't guarantee that there isn't any, not that memory unsafety is okay. Maybe an example of the sort of thing you're thinking of would clarify the distinction you're drawing.

> I think it's obvious that there are patterns of memory access that are safe, possibly provably so, but not by the compiler at this time.

This is true of "everything": given any task X, a sufficiently smart language/compiler could allow expressing it without the risk of memory unsafety (i.e. no use of `unsafe`).


Even if someone that you trust is telling you to use a specific unsafe API and that "it's no less safe than when we do the same in C/C++", proper responsible usage of that API always requires reading the documentation to determine which invariants must be upheld, if only because you ought to be documenting those exact same invariants (and describing the measures that you take to uphold them) in your own code. Don't trust anyone; Rust caters to the paranoid. :P


If someone says "here's a simple mergesort implementation. It requires unsafe in a few sports, for performance"[1] or even "you can implement a fairly normal mergesort, but oyu may want to use unsafe in areas A and B for speed", I view that differently than if someone says "you can access strings without confirming they are UTF-8 encoded with str::from_utf8_unchecked, but it requires unsafe".

One, has, for the most part a self contained implementation and is part of a very well known algorithm. Checking that the implementation doesn't have any obvious flaws may be sufficient.

The other has implications that far outlive the small suggested bit of code. Until you've correctly made sure that anything resulting from this call has been confirmed to conform to UTF-8, there is a risk in it's use in any number of string processing routines.

The article, to it's credit, does mention that problem with UTF-8 when working with it unchecked, albeit vaguely. What I don't think would have been sufficient for an article that aims to help people would be to say "it required unsafe" and leave it at that. That would be suggesting a routine for performance reasons without sufficiently addressing the real downsides.

Thus, my assertion, that simply noting that something requires unsafe is sufficient to denote in all cases the consequences of the suggestion, as I interpreted rabidferret comment to imply.

Another way of stating my argument is "when suggesting unsafe as a possible solution, it behooves you to mention any non-obvious consequences this specific suggestion might entail." That's a fairly uncontroversial view, in my eyes, so I'm not sure exactly why I've had to explain it four separate times now.

1: Like in the rust standard sort algorithm.


For what it's worth, I've used Rust for years and understand what you're saying and completely agree.


"saying it "requires unsafe" does not imply it can blow up in your face"

Good argument. If you advise someone to use unsafe the responsible thing to do is explain precisely what the consequences are besides "it's faster."


For any API that requires the `unsafe` keyword to use, improper usage of that API risks memory unsafety.


There's a subtlety here: the "actual" memory unsafety may manifest elsewhere, if the `unsafe` keyword just allows violating constraints that other (possibly-safe) functions rely on. That is, this touches on the whole "boundary of `unsafe` is the module", where functions marked `unsafe` might do perfectly safe things internally but break invariants that other pieces of code assume are true. From the view of "an API" == "an individual function", it is true that incorrect use of `unsafe` code may not risk any memory unsafety in isolation (e.g. one can pass any integer to Vec::set_len and nothing bad will happen in that call), but it is not so true in the more conventional broader view of an API.


Saying something "requires unsafe".

Literally means "something violates the safety model".

---

Here is the truth about safety guarantees. You can do a lot valid things type systems prevent you from doing. You can do a lot of valid things if/else/for/while prevent you from doing.

But 99% of the time is isn't worth it. Following the rules, even with strange BS they create is easier than managing the mess of GOTO's you'll find yourself in a year or two.


> Literally means "something violates the safety model".

Yes. But the safety model is not capable of identifying whether something is or is not safe in every case, just that it can't prove that it is safe. There are plenty of things you must do in unsafe that are safe, just not provably by the compiler. Thus, needing to use unsafe does not always imply actual unsafe operations.

If someone was under the impression that they were recommended to use unsafe but in a safe way but there were other caveats to the usage they weren't aware of, bad things could happen. I don't believe it's sufficient in a guide to rely on the fact that unsafe is recommended to convey the level of danger that recommendation entails.


unsafe litearally means "you, compiler, cannot verify that this does not violate memory safety, but I have done so, so please trust me."

Violating memory safety in unsafe code is UB.


Is it the case that "unsafe" tells the compiler to not perform its memory safety checks on that section of code, presumably because it's not possible? (What would happen if you put only code that could be verified by the compiler in an unsafe block?) If so couldn't you also think of it as a message for the next programmer who looks at the code? This section has not/cannot be verified by the compiler, approach with care/skepticism?


> Is it the case that "unsafe" tells the compiler to not perform its memory safety checks on that section of code,

So, unsafe Rust is a superset of safe Rust. Adding `unsafe` around some code lets you do four things:

* Dereferencing a raw pointer

* Calling an unsafe function or method

* Accessing or modifying a mutable static variable

* Implementing an unsafe trait

That's it. Nothing else changes, you get these additional abilities. This is very important, conceptually. Tons of other checks are still on, etc.

With that in mind,

> (What would happen if you put only code that could be verified by the compiler in an unsafe block?)

It would function identically.


No, if you call a safe function that does memory checks (like accessing a Vec by index) in an unsafe block the compiler still emits checks. You need to explicitly call the unsafe versions of those operations to remove the check.


'break your code in surprising ways' is super vague. Saying 'may cause out of bounds memory access/writes should the input not be valid UTF-8' explicitly is probably more scary.


The nature of undefined behavior is in fact super vague; it's not actually possible to say what will happen.


While technically true, a specific example helps the reader conceptualize what level of danger that means, when they don't already have a concrete understanding like you do. Recall that the breadth of experience for rust's userbase is much larger than e.g. C.


Oh yeah, I mean, I'm not saying I disagree with adding detail, I can just see the impulse to not, since anything you say may or may not be true.


me talking about rust's userbase

steveklabnik's profile: "Rust core team"

I'm guessing you know that a lot better than I do lol.


It's all good! You're absolutely right, and it's easy to forget when you're inside a bubble.


> breadth of experience for rust's userbase is much larger than e.g. C

Really? I think you mean the opposite here.


I read this as saying "not depth", that is, C programmers will have a deep understanding here, but since Rust is drawing in a lot of new blood, they on average will not.


Just say the worst thing that can happen, out of bounds access probably qualifies. :-)


"Could potentially emit code to rm -rf /"


That's the kind of vagueness that should sound like a warning to anyone considering that approach.


The nature of warnings means that any time you make assumptions about those that might need the warnings having enough knowledge to correctly assess an ambiguity, you've likely just failed a portion of the people the warning was meant to help.

The correct response when warning people of potential problems is never "oh, they should be able to figure out whether this applies to their case".


Right. The correct answer here is to convert external data into UTF-8 once and keep it that way. You're probably not going to become compute-bound, even if you're just reading in text and writing it out again. The UTF-8 check is linear time.


Most of the Rust programs I write are compute bound, and finding ways to avoid the separate UTF-8 validation step are critical to their performance.


That's because you write string search programs. Do those even need to run in UTF-8 space, as opposed to pure byte strings?


If you are reading text and writing it out you generally do not want to convert to utf8 as it is potentially lossy. Think of unix filters.


To me "your code breaks in surprising ways" means something more like "the user sees gibberish/wrong results" and not "your program segfaults". Segfaulting is not actually a surprising result to me - I have seen it over and over on out of bounds access.


A buffer overflow doesn't necessarily cause a segfault - that's the problem! A guaranteed segfault would be a completely valid and safe way to handle a buffer overflow. But segfaults only happen when your program tries to access memory it does not own. It is improbable that an overflowing buffer is straight at a page boundary, doubly so if the buffer is allocated on the stack. Instead, you get gibberish output, an invalid and unexpected program state, no effect at all (except once in a blue moon) or in the worst case a code injection vulnerability. That is, undefined behavior.


Yes, the overflow either segfaults your program, or doesn't. It's not undefined. Computers don't perform random operations. Just say "it will produce buffer overflow" - that is absolutely a defined operation: the CPU will execute a load from an address that hasn't been defined at that place in code and will either contain leftover data, or refer to an unmapped page, triggering a segfault. This isn't quantum mechanics where the underlying physical reality is inherently undefined until you measure it (caveat: as far as we know, at least), this is just insufficient knowledge of hidden variables. I understand that the compiler might emit instructions under various assumptions, which is what we mean by "undefined behaviour" when those assumptions don't hold, but once there's actual machine code running on the actual CPU, everything is absolutely perfectly defined.

All the solutions and problems in computing stem from the fact that we're programming completely deterministic Turing machines that do exactly what they're told.


That's a very narrow view of the problem. There are so many moving parts in a program that overwriting an arbitrary chunk of stack is effectively random. Do you object to the term "random number generator" too?

Just saying "or doesn't" is not a useful description. The vast majority of horrible broken behaviors fit inside of that bucket, where it doesn't segfault but then goes on to do the wrong thing at an unexpected place.

And because we're using an optimizing compiler, you can't give a simple description like "overflows a buffer". When the compiler assumes your code is correct, it might output instructions that behave in 'impossible' ways when fed invalid data. For example an if/else that takes neither branch, or both branches, or code that verifies a number has a certain value yet outputs a completely different value. You can only make assertions about what a particular compile will do, and that's obsolete information immediately.


As soon as you use more than one clock domain it's no longer completely deterministic.

https://en.wikipedia.org/wiki/Metastability_in_electronics


I'm sorry but this is useless pedantry that adds nothing to the discussion.


That seems like a really unfortunate design decision. I used to think that Java's use of UTF-16 for strings was just a problematic legacy thing, but compared to this it seems quite good. Strings are pretty high performance and there are no complex calculations to do indexing or bounds checks. And in Java 9 the JVM can switch between UTF-16 or Latin1 encodings on the fly, which both uses less RAM and speeds things up simultaneously. There are no memory safety issues caused by character encodings.


I think there might be a few misconceptions in your comment, but it's hard to say for sure. This article isn't written in a style that I particularly enjoy, but it's not too long and contains some useful facts that might help dispel those misconceptions: http://utf8everywhere.org

I do have my unrelated niche complaints about Rust's string story (and I have vague plans to resolve them), but Rust's string implementation is my favorite among any other language I've used.


What seems like an unfortunate design decision? I'm unclear what concerns you're seeing that would suggest UTF-16 as preferable to UTF-8, either in terms of performance or memory safety.


To not only use UTF-8 as the internal string encoding but practically mandate it, if you want to remain safe.

UTF-8 is a fine transport format, but for raw runtime performance it's obviously going to be an issue if you ever need to iterate over characters, do substring matches, things like that because you can't do constant time "next char" or indexing.

UTF-16 doesn't let you do that either in the presence of combining characters, but they're pretty rare and for many operations it doesn't really matter.


I feel like your comment contains a lot of misunderstanding about UTF-8. For example, UTF-8 is self-synchronizing, which means you can indeed find the "next char" in constant time.

UTF-8 is certainly not a problem for runtime performance. Substring search, for example, is as straight-forward as you might imagine. You have a needle in UTF-8 and a haystack in UTF-8, and a straight-forward application of `memmem` will work just fine (for example). In fact, UTF-8 works out great for performance , because it's very simple to apply fast routines like `memchr`. e.g., If you `memchr` for `a`, then because of UTF-8's self-synchronizing property, any and all matches for `a` actually correspond to the codepoint U+0061.

Indexing works fine so long as your indices are byte offsets at valid UTF-8 boundaries. Byte offset indexing tends to be useful for mechanical transformations on a string. For example, if you know your `substring` starts as position `i` in `mystr`, then `&mystr[i + substring.len()..]` gives you the slice of `mystr` immediately following your substring in constant time. When all your APIs deal in byte offsets, this turns out to be a perfectly natural thing to do.

Generally speaking, indexing by Unicode codepoint isn't an operation you want to do, because it tends to betray the problem you're trying to solve. For example, if you wanted to display a trimmed string to an end user by "selecting the first 9 characters," then selecting the first 9 codepoints would result in bad things in some circumstances, and it's not just limited to the presence of combining characters. For example, UTF-16 encodes codepoints outside the basic multilingual plane using surrogate pairs, where a surrogate pair consists of two surrogate codepoints that combine to form a single Unicode scalar value (i.e., a non-surrogate codepoint). So if you do the "obvious" thing with UTF-16, you'll wind up with bad results in not-exactly-corner cases.

It's worth noting that Rust isn't alone in this. Go represents strings similarly and it also works remarkably well. (The only difference between Go and Rust is that Rust's string type is guaranteed to contain valid UTF-8 where as Go's string type is conventionally UTF-8.) Notably, you won't find "character indexing" anywhere in Go's standard library or various Unicode support libraries. :-)

I would very strongly urge you to read my link in my previous comment to you. I think it would help clarify a lot of misconceptions.


This has been repeated so many times but UTF-16 does not allow constant time indexing either. Combining characters are one case and they are not rare at all. What about surrogates? What about grapheme clusters that are a complicated sequence of emoji and emoji modifiers with ZWJ?

A better suggestion is to rethink why you need those operations in the first place.


I have been doing some exploration of how well Rust optimizes Iterators and have been quite impressed.

Writing a iterator to provide the individual bits supplied by an iterator of bytes means you can count them with

    fn count_bits<I : Iterator<Item=bool>>(it : I) -> i32{
        let mut a=0;
        for i in it {
            if i {a+=1};
        }
        return a;
    } 
Counting bits in an array of bytes would need something like this

    let p:[u8;6] = [1,2,54,2,3,6];
    let result = count_bits(bits(p.iter().cloned()));
Checking what that generates in asm https://godbolt.org/g/iTyfap

The core of the code is

    .LBB0_4:
        mov     esi, edx        ;edx has the current mask of the bit we are looking at
        and     esi, ecx        ;ecx is the byte we are examining
        cmp     esi, 1          ;check the bit to see if it is set (note using carry not zero flag)
        sbb     eax, -1         ;fun way to conditionally add 1
    .LBB0_1:
        shr     edx             ;shift mask to the next bit
        jne     .LBB0_4         ;if mask still has a bit in it, go do the next bit otherwise continue to get the next byte 
        cmp     rbx, r12        ;r12 has the memory location of where we should stop.   Are we there yet?
        je      .LBB0_5         ; if we are there, jump out. we're all done
        movzx   ecx, byte ptr [rbx]  ;get the next byte
        inc     rbx             ; advance the pointer
        mov     edx, 128        ; set a new mask starting at the top bit
        jmp     .LBB0_4         ; go get the next bit
    .LBB0_5:
Apart from magical bit counting instructions this is close to what I would have written in asm mysef. That really impressed me. I'm still a little wary of hitting a performance cliff. I worry that I can easily add something that will mean the optimiser bails on the whole chain, but so far I'm trusting Rust more than I have trusted any other Optimiser.

If this produces simiarly nice code (I haven't checked yet) I'll be very happy

   for (dest,source) in self.buffer.iter_mut().zip(data) { *dest=source }


How much of this is the Rust compiler and how much is generic LLVM optimization passes?


It's the combination of LLVM, the Rust compiler providing enough information to LLVM, and very optimized library code. As well as plenty of testing to make sure this kind of stuff doesn't regress :)


The compiler itself currently does only some basic optimizations, it's all LLVM.

However, the compiler does have perfect aliasing info, and it does provide a subset of this info to LLVM (I don't think LLVM IR currently supports more fine grained aliasing info being provided from the compiler). This does help certain optimizations; though I'm unsure if this one is one of them.


Okay, wait. Every modern CPU has a popcount instruction, so any hand-coded implementation would use that, meaning the compiler output is actually pretty bad in an absolute sense.

But if you find popcount too "magical", the commonly-known fast way to count bits is via masking, shifts and adds, so that you do it in log(n) steps. Which also would perform much better than this solution.

So what you're really saying is "the compiler managed to make a pretty efficient representation of the naive solution" which is fine but it does not mean your code is fast.


> Okay, wait. Every modern CPU has a popcount instruction

What do you consider a "modern CPU"? Atom chips sold less than a decade ago didn't support popcnt. AMD shipped some C-series chips without support for it as recently as 2012. The low-end chips were likely to be sold later without refreshes to newer features in some cases, and those are also likely the ones to be repurposed for small and cheap x86 devices later. I wouldn't want the default compilation settings without specifying CPU extensions to use an extension that might not exist on my target platform.


This is why you have a fallback to a generic version.


As I stated in another reply. I don't actually want to count bits at all. Counting is just the simplest thing I could do with the bits in the test case. Turning a stream of bytes into a stream of bits can be quite useful.


But if you are reading a stream of bits, you don't want to read one bit at a time either, because that is pathologically slow. You want to read n buts at a time (where n varies each call, probably) at which point you're doing a very standard mask-and-shift with no magic...


That's really impressive, but I'm a little surprised LLVM didn't optimize that inner loop to a popcnt instruction.


You may have to specify a flag like "-C target-cpu=native" -- not all CPUs support popcnt, so LLVM can't generate it without knowing details about the target. I can't get the compiler on godbolt.org to generate it either way, though.


You need "-C opt-level=3 -C target-cpu=native -C target-feature=+ssse3"

I actually wrote about it last week: http://tmccrmck.github.io//post/rust-optimization-partii/


You can drop `-C target-feature=+ssse3` since `target-cpu=native` will cover that. The `-C opt-level=3` isn't necessary either, but if you drop that, it looks like there's a function that doesn't get inlined. However, even without opt-level=3, the popcnt instruction is still emitted.


There's a little typo in your C implementation of popcount: you reference `x` when you probably meant `n`


Thanks. I fixed it.


Does Rust support FMV (function multi-versioning)?


> If this produces simiarly nice code (I haven't checked yet) I'll be very happy

You didn't specify what the type of `buffer` was, so I picked a `u8`. This code[1]:

    pub struct Thing {
        buffer: Vec<u8>,
    }
    
    impl Thing {
        pub fn copy(&mut self, data: &[u8]) {
            for (dest, &source) in self.buffer.iter_mut().zip(data) {
                *dest = source;
            }
        }
    }
Produces this assembly:

    _ZN10playground5Thing4copy17hf523bcb10e2298f3E:
    	.cfi_startproc
    	pushq	%rax
    .Ltmp0:
    	.cfi_def_cfa_offset 16
    	movq	16(%rdi), %rax
    	cmpq	%rdx, %rax
    	cmovbeq	%rax, %rdx
    	testq	%rdx, %rdx
    	je	.LBB0_2
    	movq	(%rdi), %rdi
    	callq	memcpy@PLT
    .LBB0_2:
    	popq	%rax
    	retq
The call to `memcpy` is what makes me happy.

----

Your `count_bits` already exists as a combination of iterator adapters (`filter`[2] and `count`[3]):

    a_bit_iterator.filter(|bit| bit).count()
Although if you have a numeric value, I'd suggest using `count_ones`[4], which can use the `popcnt` intrinsic.

If you wanted to count all the bits in an array, I'd suggest `map`[5] and `sum`[6]

    let p = [1u8, 2, 54, 2, 3, 6];
    let result: u32 = p.iter().map(|b| b.count_ones()).sum();
If you wanted to keep the bit iterator, you could also use `flat_map`[7].

[1]: https://play.integer32.com/?gist=03f8ffbe3ade6ced4d315c8e020...

[2]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#metho...

[3]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#metho...

[4]: https://doc.rust-lang.org/std/primitive.u8.html#method.count...

[5]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#metho...

[6]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#metho...

[7]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#metho...


>The call to `memcpy` is what makes me happy.

That's the sort of thing I was hoping to see.

>Your `count_bits` already exists as a combination of iterator adapters (`filter`[2] and `count`[3]):

That's the problem with simple examples. I don't actually want to count bits. It was just the minimum workload I could think of to generate a result from the conversion.

Seeing how the for (a,b) in ai.zip(bi) works well I'll probably be writing a bitmap glyph renderer that is basically if b {*a=color}


I'm not a compiler expert, but it seems like some of these should be unnecessary, especially with Rust's strong knowledge of types and ownership lifespans. Like this example from the article:

    let nopes : Vec<_> = bleeps.iter().map(boop).collect();
    let frungies : Vec<_> = nopes.iter().filter(|x| x > MIN_THRESHOLD).collect();
where he recommends avoiding the first collect(). Can't the optimizer do that for you if you don't do anything else with nopes?


There might be side effects from the first call to `collect`, so the compiler can't get rid of it without potentially changing the semantics of the program.


Best example of this (at least in Java):

    List<CompletableFuture> futures = requestList.map(Client::makeRemoteCall).collect(toList);
    
    List<Response> responses = futures.map(CompletableFuture::get).collect(toList);
Basically, we need to start all of the futures, then wait for all the futures. Eliminating the first call to `.collect` would force these remote calls to happen one at a time.


This is the kind of example I was hoping for, and I understand a little better now where collect() can have semantically-visible side effects. It seems like it should be possible to have a smarter type system that can tell the difference between futures where collect() can be optimized out and ones where they can't though.


> It seems like it should be possible to have a smarter type system that can tell the difference between futures

While that Java code is neat, you don't need to get anywhere near concurrency to exhibit problems with side effects. Here's a program with two iterator chains and two calls to collect:

    let foo = [1, 2, 3];
    
    let bar: Vec<_> = foo.iter()
        .map(|x| {
            print!("{} ", x);
            x + 10
        })
        .collect();
        
    let qux: Vec<_> = bar.iter()
        .map(|x| {
            print!("{} ", x);
            x * 10
        })
        .collect();
This program prints "1 2 3 11 12 13 ".

Here's that same program, but with the first collect removed and the iterator chains collapsed into one:

    let foo = [1, 2, 3];
    
    let bar: Vec<_> = foo.iter()
        .map(|x| {
            print!("{} ", x);
            x + 10
        })
        .map(|x| {
            print!("{} ", x);
            x * 10
        })
        .collect();
This program prints "1 11 2 12 3 13 ".

When you have two iterator chains with two calls to collect, everything in the first iterator runs to completion, then the second iterator chain runs. When you collapse those two chains into one, you change the order in which things happen.

It's still true that a smarter type system would be able to track side effects, but like all effects systems, these things are viral, and it's not immediately clear whether the annotation burden makes up for itself in optimization potential.


Something something Applicative vs Monadic code mumble handwave.

Actually that's part of why ApplicativeDo was created: https://research.fb.com/publications/desugaring-haskells-do-... It's pretty cool.


On a high level it seems like it could, but collect() performs heap allocation (and size checks and reallocations for unknown-length iterators), and that's probably too big side effect for LLVM to ignore.


LLVM will remove heap allocations if they're unused. Optimizing the heap isn't something that LLVM refuses to do as a matter of principle.


Does current Rust contain a way to mark functions as Pure?


Not in the referentially-transparent sense. Most Rust functions meet almost all of the practical criteria for for purity (i.e. does not mutate global state (or otherwise any state that was not explicitly passed in to the function), does not do I/O), but that's only a comfort for the programmer's ability to reason about the code; this weakened notion of purity-by-default isn't enough to allow the typical optimizations via purity (e.g. memoization).


but that's only a comfort for the programmer's ability to reason about the code

A way to mark functions as pure for this purpose would be great! Especially if it's not as fraught as const in C++.


We actually did have this once, but it wasn't really worth it, so it was removed. https://news.ycombinator.com/item?id=6940624 is the HN discussion, but it looks like the link might now be wrong?

It was also a very, very long time ago, and so today's Rust might be different enough that those reasons don't apply any more.


The reason at that point vis that there weren't any practical benefits and that it was preferable to wait for a more general mechanism. The first claim is dubious, but I can empathize with second, as long as it doesn't become tacked on.

Haskell can do cool optimizations that make it feel like magic sometimes.


Note that a lot of these optimizations aren't necessary in Rust -- e.g. list fusion is only valuable in Haskell because mapping over lists is exposed as a one-shot operation that produces a new list. So `map map map list` is conceptually building 2 whole lists of temporaries you don't care about (and you often don't actually care about the last one either, in cases where you just iterate over it and discard it).

Meanwhile mapping in Rust generally takes an iterator and produces another iterator that will apply the given closure to the current element as it's yielded. So the naive codegen for iter().map().map().map().collect() is exactly what list fusion is trying to produce -- no temporary lists.

TL;DR: making "map" having the monadic `T[U] -> T[V]` signature is really expensive. ¯\_(ツ)_/¯


In some ways, const fn reminds me of this.


> We actually did have this once, but it wasn't really worth it, so it was removed.

Ah, yeah I had a vague memory that that was the case, which is why I asked.. I seem to recall it being removed, but I only follow Rust news, don't use it, so I wasn't sure of the reasons or implications.


I think that being able to enforce purity would be fantastic for actors, agents, and/or game loops.

https://news.ycombinator.com/item?id=8609775


It wouldn't help in most of these cases. It'd be too burdensome to require that map/filter/etc. take pure arguments. In the absence of that, the compiler is left with examining the specific function being called, which it can already do as a direct call as long as the body is visible (#[inline]). If the body is visible, LLVM will internally automatically mark the function as pure (readnone) if it is.


const does not imply pure. If applied to a member function, it makes this const; if applied to types it makes types const. Purity is a different concept. The closest match would probably be constexpr.


const does not imply pure. If applied to a member function, it makes this const; if applied to types it makes types const.

You just demonstrated what I mean when I comment that const is fraught.

Purity is a different concept.

Purity more easily maps more closely to an abstract concept and is easier to for people reason about.


Good question. That would mean chaining two statements together so "nopes" is not allocated as a symbol. But, with "let", the user is explicitly instructing the compiler to allocate a reference for "nopes", so i don't think the compiler would chain the statements. Unless it takes the task to checking that "nopes" is not used elsewhere.

<Disclaimer: I don't implement compilers so, i can be totally wrong>


The code is problematic, but not because of the let. Optimizations work basically on the as-if principle: they're free to execute code any way they like, but the results must be as if they followed your explicit instructions. Compilers need not, and usually do not, preserve references just because you gave them a name--if you've ever run gdb on optimized code, you'll note how many variables become "<optimized out>" (although that's often a lie for other reasons).

The real problem is that you're shoving the variable into a Vec<>, which means that you are doing heap allocation. This means that optimizing it out requires matching the heap allocation to the free, noting that the side effects of these two function calls are only about allocation. There's also issues with respect to inlining, dead-code elimination, and then doing data dependence analysis to prove that the two loops can be fused.


Compiler backends don't particularly care about the user-defined variables in the source, since a typical compilation step is to convert to SSA form ( https://en.wikipedia.org/wiki/Static_single_assignment_form ). And it's relatively easy for a backend to remove allocations (or e.g. ignore the act of taking a reference) for variables that never get used. As long as the semantics are preserved, it's all good. The problem in this case is that removing implicitly that first call to collect might change the semantics of the program.


> for i in 0..(xs.len()) { let x = xs[i]; // do something with x } > should really be this:

> for x in &xs { // do something with x }

I am curious why the compiler can't rewrite the former to the latter?


Because in the former case, the optimizer has to prove that the length of the array cannot change during the body of the loop, while in the latter case, that's guaranteed by the language semantics.


Also because the iterator could do something completely different from an iteration by sequential index no?


Doesn't `let x = xs[i]` immutably borrow from xs? So for the duration of x's lifetime (which is the entire for-body block), xs cannot be changed and therefore its length must remain the same.

Though this might be information that rustc knows about but not LLVM.


In the case of "let x = &xs[i]" it would immutably borrow. But we'd need more MIR optimizations to make use of that fact.


A Sufficiently Smart Compiler would be able to.

The second just increments a pointer which starts at the first elem of the array; the first increments a counter and then uses it as an offset into the array, which is bounds checked (the second never generates bounds checks). Proving that that counter never exceeds the end of the array, so the bounds check can be dropped, is not trivial.


At least in simple cases, it generates almost identical code for both: https://godbolt.org/g/twuLd1

(I may have gotten something wrong, this is my first Rust program! Yay!)


>Sometimes, when changing an enum, we want to keep parts of the old value. Use mem::replace to avoid needless clones.

    use std::mem;

    enum MyEnum {
        A { name: String, x: u8 },
        B { name: String }
    }

    fn a_to_b(e: &mut MyEnum) {

        // we mutably borrow `e` here. This precludes us from changing it directly
        // as in `*e = ...`, because the borrow checker won't allow it. Therefore
        // the assignment to `e` must be outside the `if let` clause. 
        *e = if let MyEnum::A { ref mut name, x: 0 } = *e {

            // this takes out our `name` and put in an empty String instead
            // (note that empty strings don't allocate).
            // Then, construct the new enum variant (which will 
            // be assigned to `*e`, because it is the result of the `if let` expression).
            MyEnum::B { name: mem::replace(name, String::new()) }

        // In all other cases, we return immediately, thus skipping the assignment
        } else { return }
    }
Don't get me wrong, I think Rust is an incredibly impressive language, but this is nuts. For the first time ever, I had a moment of appreciation for the "simple clarity" of C++11 move constructors. If it weren't for the comments and the documentation[1] I wouldn't have had the slightest clue what this code is doing (a hack to fool the borrow checker while allowing for Sufficiently Smart Compilation to a no-op[2] ... I think).

This is a good example of the main conceptual aspect of Rust where I feel it could use improvement.[3] A lot of its features marry very high-level concepts (like algebraic data types) to exposed, low-level implementations (like tagged unions). Now, there's nothing wrong with the obvious choice to implement ADTs as tagged unions, but the nature of Rust as a language that exposes low-level control over allocation and addressing, in combination with the strictures of the borrow checker, means enums and other high-level features live in a sort of uncanny valley, falling short of either the high-level expressiveness of ADTs or the low-level flexibility of tagged unions (without expert-level knowledge or using `unsafe`).

Similarly, the functional abstractions almost feel like a step backwards from the venerable C++ <algorithm>, abstraction and composition-wise. Very nitty-gritty, low-level implementation details leak out like a sieve — you can't have your maps or folds without a generous sprinking of `as_slice()`, `unwrap()`, `iter()` / `iter_mut()` / `into_iter()` and `collect()` everywhere, although at least for this case you would be able to figure out the idioms by reading the official docs. But nevertheless it seems like reasonable defaults could be inferred from context (with the possible exception of `collect()` since it allocates), while still allowing explicitness as an option when you want low-level control over the code the compiler generates.

In this case, enums are a language-level construct, not a library, so the borrow-checker really shouldn't be rejecting reasonable patterns. It (legal to alias between the structural and nominal common subsequence of several members of an enum) should be the default behavior and not require an nonobvious, unreadable hack such as the above. At the very least that behavior (invaluable for many low-level tasks) should be easy to opt in to with e.g. an attribute.[4]

[1] https://doc.rust-lang.org/std/mem/fn.replace.html

[2] Or rather since it is a tagged union, in MASMish pseudocode something like

        jnz x_nonzero
        mov MyEnum.B, e.tag
    x_nonzero:
        ret
Although it could still be a no-op depending on what else Sufficiently Smart Compiler/LLVM inlines.

[3] And I'm not saying I have the solution or even that all-around-better solutions exist.

[4] Something like

    #![safe_alias_common_subsequence(structural)]
    #![safe_alias_common_subsequence(nominal_and_structural)]


One problem with what you propose (as I understand it) is that it's not safe to have uninitialized data in the presence of panics. A panic can cause control flow to unwind and destructors to be invoked, and if you have partially uninitialized data you have a recipe for use after free. In C++ you just get use-after-free hazards everywhere (which is one of the many reasons why exception safety in C++ is extremely hard).

Another problem with your proposal is that it's not safe to access the structural and nominal common subsequence of several members of an enum in the same way, because the Rust compiler can and will reorder the fields differently in different variants in order to fill padding.

That said, it sounds like what you're looking for is one of the various inheritance proposals, which have safe access to common enum fields as one of the main guarantees. This would make this pattern much more ergonomic.


Ah, I wasn't aware that Rust reorders structure members. Although I still feel it should be possible, whether the compiler just needs to change the tag or has to shift data around.

What I was proposing was that common subsequences of enum fields be treated by the compiler as synonyms, but accesses to uninitialized data would still be illegal. One way this might be implemented is by, when necessary, creating additional tags behind the scenes, e.g. MyEnum::__AB__ meaning A was initialized but B is the active member, so only accesses to their common subsequence are legal, MyEnum::__BA__ meaning the reverse, and so forth. Or it could be restricted to fields that contain no members outside their common subsequence with nontrivial destructors. Or the compiler could null out any uninitialized references to such.

>That said, it sounds like what you're looking for is one of the various inheritance proposals, which have safe access to common enum fields as one of the main guarantees. This would make this pattern much more ergonomic.

Interesting. Link?


> What I was proposing was that common subsequences of enum fields be treated by the compiler as synonyms, but accesses to uninitialized data would still be illegal. One way this might be implemented is by, when necessary, creating additional tags behind the scenes, e.g. MyEnum::__AB__ meaning A was initialized but B is the active member, so only accesses to their common subsequence are legal, MyEnum::__BA__ meaning the reverse, and so forth. Or it could be restricted to fields that contain no members outside their common subsequence with nontrivial destructors. Or the compiler could null out any uninitialized references to such.

I feel like that's too much hidden behind-the-scenes magic for a systems language, and I suspect the majority of the Rust community would feel the same way, but feel free to file an RFC.

> Interesting. Link?

https://github.com/rust-lang/rfcs/issues/349


> Similarly, the functional abstractions almost feel like a step backwards from the venerable C++ <algorithm>, abstraction and composition-wise. Very nitty-gritty, low-level implementation details leak out like a sieve — you can't have your maps or folds without a generous sprinking of `as_slice()`, `unwrap()`, `iter()` / `iter_mut()` / `into_iter()` and `collect()` everywhere

That's not been my experience at all. In fact I've found <algorithm> to not only be extremely limiting but also being painfully verbose due to the need of begin/end pairs everywhere.


Slight hyperbole there; it's more that I feel the contrast between the two could be much greater.


> With some investment into optimizations, matching or exceeding C’s speed should be possible in most cases.

What class of algorithms is faster in Rust than it is in C?


I don't think it can be generalized like that.

Theoretically, Rust's memory ownership/aliasing rules are stricter and more granular than C's restrict, so some pointer-heavy code could optimize better. Rust is very good at inlining by default. Rust makes it easy to use stack-allocated structures. But C can do the same, it's just a matter of effort.

Both languages are low level enough that you can use them as a portable assembly and endlessly tweak them to one-up the other. If some C code doesn't optimize well you can write a more contorted code that will.


Yeah the only thing that Rust might have over C, in terms of really optimized implementations, is low-level idioms that C declares to be UB, but Rust declares to be defined (generally to be what x64 hardware does).

Maybe something that leverages signed overflow, overlong shifts, or type punning.

But if you have enough control over your codebase to mandate the compiler/flags its built with, then you can generally tell the major C compilers to act like Rust in these cases.

That said, the expected win for Rust over C(++) in practice is that you can be more "reckless", because you have a stronger type system protecting you from messing things up. A production-quality C(++) codebase might rightly do more copies, use more reference counting, and use less concurrency just because the risk of doing otherwise isn't worth the potential performance wins.

Organizations have limited resources to commit to optimizing/verifying code. Rust is intended to get you more bang for your buck.


> That said, the expected win for Rust over C(++) in practice is that you can be more "reckless", ...

I'm glad to see somebody articulate this observation. SaferCPlusPlus[1] is meant to, in part, bring this benefit to existing C++ code bases. The question is, would a borrow checker for C++ make sense?

[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus


"Similarly, the Read::lines() iterator is very easy to use, but it has one downside: It allocates a String for each line. Manually allocating and reusing a String will reduce memory churn and may gain you a bit of performance."

Back when Rust used "internal" iterators, this could be the default. Now, you can encapsulate it by passing a function to handle each line.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: