
Eliminating branches in Rust for fun but not much profit - dbaupp
http://kamalmarhubi.com/blog/2015/09/15/eliminating-branches-in-rust-for-fun-but-not-much-profit/
======
viraptor
There are some really good alternatives in reddit comments:
[https://www.reddit.com/r/rust/comments/3l3bnn/eliminating_br...](https://www.reddit.com/r/rust/comments/3l3bnn/eliminating_branches_in_rust_for_fun_but_not_much/)

------
bonzini
It's a bit weird that he doesn't provide his solution. I would be surprised if
this isn't faster:

    
    
        pub fn is_token1(c: u8) -> bool {
            let mask = 
                if c < 32 { !0 }
                // c != 32 && c != 34 && c != 40 && c != 41 && c != 44 && c != 47 && !(c > 57 && c < 64)
                else if c < 64 { 1 + 4 + 256 + 512 + 4096 + 32768 - (1 << 25) }
                // c != 64 && !(c > 90 && c < 94)
                else if c < 96 { 1 + (1 << 30) - (1 << 26) }
                // c != 123 && c != 125
                else if c < 128 { (1 << 27) + (1 << 29) }
                else { !0 };
    
            ((mask >> (c & 31)) & 1) == 0
        }
    

or this for 64-bit:

    
    
        pub fn is_token2(c: u8) -> bool {
            // roughly follows the order of ascii chars: "\"(),/:;<=>?@[\\]{} \t"
            let mask = 
                if c < 64 { (1u64 << 33) - 1 + ((4u64 + 256u64 + 512u64 + 4096u64 + 32768u64) << 32) + !((1u64 << 57) - 1) }
                else if c < 128 { 1 + (1 << 30) - (1 << 26) + (1u64 << 59) + (1u64 << 61) }
                else { !0u64 };
            
            ((mask >> (c & 63)) & 1) == 0
        }
    

(Forgive me, it's the first time ever I write Rust...) and if it is slower, it
could also be because LLVM generates a slow bt instruction.

~~~
Gankro
Note that rust has bit literals, so you can explicitly write bit masks out:

    
    
        let mask = if blah { 0b11111110000000001001001100000101u64 } else { 0 }
    

and you can include optional underscores wherever for clarity

    
    
        let mask = if blah { 0b_11111110_00000000_10010011_00000101_u64 } else { !0 }

------
chrisdew

        fn is_token(c: u8) -> bool {
            // roughly follows the order of ascii chars: "\"(),/:;<=>?@[\\]{} \t"
            c < 128 && c > 32 && c != b'\t' && c != b'"' && c != b'(' && c != b')' &&
                c != b',' && c != b'/' && !(c > 57 && c < 65) && !(c > 90 && c < 94) &&
                c != b'{' && c != b'}'
        }
    

I've not done much with Rust, but why is it stated: “LOOK && AT && ALL &&
THOSE && BRANCHES!”

I don't see branches, just the evaluation of a complex bool expression.

If there were lots of "ifs" and "elses" I would see branches.

Can someone explain?

~~~
masklinn
> Can someone explain?

&& short-circuits, so before each &&, if the LHS is false it will jump to the
end of the whole thing with a `false` result

If you plug the function into goldbot
([http://rust.godbolt.org](http://rust.godbolt.org)) you see a bunch of
conditional jumps:

    
    
        	cmpl	$7, %ecx
    	jb	.LBB0_10
    	movzbl	%dil, %ecx
    	cmpl	$47, %ecx
    	je	.LBB0_10
    	cmpl	$44, %ecx
    	je	.LBB0_10
    	movb	%dil, %dl
    	andb	$-2, %dl
    	movzbl	%dl, %edx
    	cmpl	$40, %edx
    	je	.LBB0_10
    	cmpl	$34, %ecx
    	je	.LBB0_10
    	cmpl	$33, %ecx
    	jb	.LBB0_10
    	testb	%dil, %dil
    	js	.LBB0_10

~~~
chrisdew
Thanks, I'd not seen godbolt before - this is very useful.

------
vbezhenar
Not everyone running Haswell. Would be interesting to compare this performance
with other processors.

~~~
acconsta
Also mobile. ARM processors have less-sophisticated branch prediction, and on
the lower end, no branch prediction at all!

~~~
masklinn
Isn't that because lower end ARM is in-order?

------
evmar
I went down a similar route in a performance-sensitive lexer (started with
naive code, spent some time fiddling with different ways to express it) and
eventually used a lexer generator, which uses a combination of lookup tables
and tests.

[https://github.com/martine/ninja/blob/master/src/lexer.cc#L1...](https://github.com/martine/ninja/blob/master/src/lexer.cc#L117)

------
userbinator
Testing if a value is in one of two sets reminds me of this article:

[http://www.hugi.scene.org/online/coding/hugi%2017%20-%20coaa...](http://www.hugi.scene.org/online/coding/hugi%2017%20-%20coaax.htm)

Depending on whether the sets line up, this can be done in 2(!) instructions
totaling to 4 bytes, and around 25 clock cycles. Nowhere near as fast as the
bit-table lookup approach, but possibly the smallest.

------
phkahler
For some cases I've found bitwise stuff to be faster. I'd try replacing: (c <
128 && c > 31)

With ((c < 128) & (c > 31)) != 0 if this were in C.

~~~
Gankro
This strikes me as the kind of peephole optimization that LLVM would do anyway
(similar to people manually converting `x / 2` to `x >> 1`, my nemesis). But
maybe it's esoteric enough for them to not bother?

~~~
bonzini
At least GCC definitely does it (and also the other way round).

~~~
phkahler
When I first started using it, I was using GCC. But that was several years
ago. Maybe I can be less explicit about it now.

------
rcthompson
I know the (non-bitwise) boolean operators have shortcutting behavior, but if
evaluating the second clause has no side effects, couldn't the compiler just
cheat by evaluating both clauses unconditionally and then combining the two
boolean results using the equivalent bitwise operation, thus avoiding a
branch?

------
drewm1980
This sounds like a textbook candidate for superoptimization.

