Calculating a square is ~1. A square root it ~N. Much harder to do. Its a common...

borkabrak · on March 8, 2016

She's right about squaring on each iteration, though. And granted, a square root is much more expensive, though done only once. Which method is faster would depend on the number of iterations.

uxcn · on March 8, 2016

The latency for sqrtss on broadwell is 11 cycles with a throughput of 4, where mul is a latency of 3 and throughput of 1. So, using some concrete numbers, sqrt is more expensive, but not polynomially or even an order of magnitude.

JoeAltmaier · on March 8, 2016

Right with modern floating point implementations its not the old guess-and-iterate method any more. SQRT is probably now on the order of an inverse?

uxcn · on March 8, 2016

I honestly haven't looked into it in that much detail, but if I had to guess... probably. I'm not sure what the implementation used in hardware currently is, but at very least it's constant bounded,

Conversions are still somewhat expensive, but I do know compared to polynomial time or a large enough constant, it can be a better choice for an optimization. For example, computing log10 of an integer.

shasta · on March 8, 2016

Since you're claiming that square root is "~N", the N here must be the number of digits. The number of multiplications required for the "optimization" you're proposing is exponential in this N.