
Let the Compiler Do the Work - milliams
http://cliffle.com/p/dangerust/6/
======
kstenerud
"sqr is a place where one might be tempted to use a macro in C, but writing
such a macro in a way that x doesn't get evaluated twice is tricky."

Slightly off-topic, but it's important to point out that in modern C,
implementing such a function as a macro is always a mistake. You'd do it as an
inline function.

Macros in modern C should only be used for code generation (for DRY). If the
language supports doing it without a macro, then do it without a macro.

~~~
jnordwick
The point for doing it as a macro is to be generic over types. In C++ you
would do a template function, but no real option in C.

~~~
simias
That's true, there are many situations where you simply can't replace a macro
with a function because of this fact. MIN/MAX is a common example, unless you
want to have one specialized version for every type.

Unfortunately there are always tradeoffs using macros for something like that,
you either end up evaluating the parameters more than once or you have to
introduce variables in the macro that may shadow existing variables. Some
compilers have extensions to help with that:
[https://gcc.gnu.org/onlinedocs/gcc/Statement-
Exprs.html](https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html)

~~~
oefrha
> you have to introduce variables in the macro that may shadow existing
> variables.

You certainly don’t have to introduce the risk of shadowing existing
variables. You just use the standard

    
    
      #define MACRO(x) \
          do { \
              /* Define whatever variables you like */ \
              /* Do things... */ \
          } while (0)

~~~
jlokier
First, you can't do that for the macros like MIN/MAX mentioned in the comment
you just replied to, and sqr mentioned earlier.

Second, you still risk shadowing existing variables, even if you make an
effort to use "unlikely" names for the locals such as
"___my_macro_secret_variable_1", if the macro might be used in one of its own
arguments. For example MACRO(MACRO(x)).

If that sounds unlikely, consider MAX(a,MAX(b,c)), which is likely to happen
eventually, if your codebase uses such macros or if they are part of a
library.

~~~
oefrha
When you decide to define new variables you have already decided to write more
than just an expression (which by definition can’t contain a statement), so
“you can’t do that for...” is a moot point. And since it’s not an expression,
it’s not gonna appear as its own argument, unless you’re using more advanced
substituting a whole block trick which is not what the do while(0) idiom is
for.

~~~
simias
I can't make sense of what you're saying. See the page I linked in my previous
post for an example of what we're talking about:
[https://gcc.gnu.org/onlinedocs/gcc/Statement-
Exprs.html](https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html)

    
    
        #define maxint(a,b) \
          ({int _a = (a), _b = (b); _a > _b ? _a : _b; })
    

>Note that introducing variable declarations (as we do in maxint) can cause
variable shadowing [...] this example using maxint will not [produce correct
results]:

    
    
        int _a = 1, _b = 2, c;
        c = maxint (_a, _b);

~~~
oefrha
That’s a GNU C extension. I don’t write non-portable code like that. (Okay, I
didn’t notice the link in your previous post, so I was basically replying to
the wrong thing.)

------
simias
One of the main quality-of-life improvement of Rust over C and C++ IMO is the
crate system, and how you don't have to worry about codes from other files not
being inlined properly (there's LTO now but it's always more limited than
compile-time inlining and optimization).

No need to move things to headers (no headers at all, actually), worry about
forward declarations and exposing internal bits of the API etc... Just write
the code normally and naturally and let the compiler figure it out. No need to
consider subtle performance tradeoffs when deciding where to write the code,
just put it where it makes sense.

~~~
cwzwarich
> there's LTO now but it's always more limited than compile-time inlining and
> optimization

In what way do you think it is more limited?

------
burundi_coffee
My prof in Introduction to Programming always said: "Generally, it's a good
idea to avoid decisions."

------
lorenzhs
Page isn't loading for me, so here's a snapshot:
[https://archive.is/i5uvl](https://archive.is/i5uvl)

------
shilch
Clang also does some crazy-good SIMD optimizations if you set -march=native.
But I think the Rust compiler is based on LLVM as well, right?

~~~
rwem
Everybody should build everything with a reasonable setting of march. Debian
Linux still builds everything for K8, a microarchitecture that lacked SSE3,
SSE4, AVX, AES, etc. Even building with march=sandybridge seems petty
conservative and gives significant speed improvements in many common cases.

------
glangdale
I'm not a Rust guy, so can someone explain to me why SIMD intrinsics are
"unsafe"? They don't seem unsafe in the way that, I dunno, raw memory accesses
are unsafe.

Non-portable, of course, I get.

~~~
nikic
SIMD intrinsics are unsafe, because reaching a SIMD intrinsic not supported by
the host CPU is undefined behavior (in practice usually SIGILL).

~~~
keldaris
So SSE2 intrinsics should be safe for x86_64 builds?

~~~
Narishma
Yes.

------
JoshMcguigan
For your `sqr` function, what is the benefit of writing `x * x` over using
`x.powi(2)` [0]? You didn't mention it in the article, but did you find a
performance improvement from doing this?

[0]: [https://doc.rust-
lang.org/std/primitive.f64.html#method.powi](https://doc.rust-
lang.org/std/primitive.f64.html#method.powi)

~~~
chias
As far as CPU instructions go, multiplication is _significantly_ faster (i.e.
an order of magnitude) than exponentiation.

That said, I can't speak to whether the Rust compiler wouldn't just optimize
that away -- it seems like unrolling exponentiation into multiplication for
small constant powers would be a very safe and easy thing to do.

~~~
raphlinus
It certainly does do this:
[https://rust.godbolt.org/z/ZaDAKa](https://rust.godbolt.org/z/ZaDAKa)

Not just small constant powers either. I tried .powi(1000000) and it compiled
into a sequence of 25 vmulss instructions.

~~~
chias
Interesting, thank you!

I suspect then that defining it at x*x is just just because it's easier to
type than x.powi(2).

