Strange multiplication behavior in Guile Scheme interpreter

wingo · on Jan 24, 2013

http://git.savannah.gnu.org/gitweb/?p=guile.git;a=commit;h=2...

Fixed by Mark H Weaver, on Fri, 7 Dec 2012.

Avoid signed integer overflow in scm_product

* libguile/numbers.c (scm_product): Avoid signed integer overflow, which modern C compilers are allowed to assume will never happen, thus allowing them to optimize out our overflow checks.

* test-suite/tests/numbers.test (*): Add tests.

qznc · on Jan 24, 2013

A nice example for Regehr's post: C and C++ Aren’t Future Proof

http://blog.regehr.org/archives/880

regehr · on Jan 24, 2013

Awesome, I've added a link to the bottom of the post.

cwzwarich · on Jan 24, 2013

Checking for signed overflow by first performing signed overflow is a great C antipattern.

regehr · on Jan 24, 2013

Exactly. I've seen plenty of overflow detection libraries that do precisely this. The undefined overflow plus the lack of access to CPU flags makes a pretty good argument against doing this stuff at the C/C++ level.

gjm11 · on Jan 24, 2013

Interesting, but given that the code in question computes (expt 2 488) as the square of (expt 2 244), is there actually any way for that bug to cause the observed symptoms?

(2^244 is being shown as zero. 2^488 is being shown as non-zero; presumably the right value though I haven't actually checked. If 2^244 is actually being computed as zero, then unless what's happening somehow depends on the depth of the stack -- surely not possible if this bug is the cause -- 2^488 should be computed as zero squared, which is zero.)

wingo · on Jan 24, 2013

Sure, it's possible. The sequence of multiplications chosen to reach the exponent is different, so these expressions take different paths through scm_integer_expt.

gjm11 · on Jan 24, 2013

But the sequence of multiplications chosen to reach the exponent isn't different. What the user's expt function does, when asked to compute (expt 2 488), is to compute (expt 2 244) and then square it. When it computes (expt 2 244) it does the exact same sequence of multiplications as if you just asked for that directly from the REPL.

What am I missing here?

peteri · on Jan 24, 2013

I currently think the C compiler vendors have gone too far with this.

Although the specification says this behaviour is undefined in practice I used to map the undefined portion to being whatever my hardware did, not whatever the compiler felt like this week.

This ruins the C as a portable assembler use case and means if I was doing this sort of work I'd probably write these sort of routines in assembler rather than trying to fight with versions of C compiler.

colanderman · on Jan 24, 2013

...how was C "portable" assembler if its semantics depended on the underlying hardware? That's by definition unportable.

zvrba · on Jan 24, 2013

This shows that C and C++ are getting more and more out of sync with modern CPUs. Outdated if you want.

Modern CPUs have wide multiplies (32x32->64), wide divides (64/32->32), a whole bunch of useful arithmetic status flags, yet C gives you access to nothing of that. Instead, it forces you to divide first (very slow) before multiplying to ensure that overflow won't happen.

Even a simple division check before multiplying isn't sufficient, since there's a border case where the two operands are INT_MIN and -1. In 2nd complement arithmetic, both division and multiplication will overflow. So you need yet another check.

But gcc folks are also obtuse: an implementation is allowed to define what is technically UB. Optimizing out operations that are perfectly defined on a 2nd-complement architecture seems to be against the "spirit" of C -- that you can guess the compiler's output with reasonable certainty. I'd call this outright maliciousness on the part of compiler implementors.

People dislike C++ saying too much is happening behind the curtains. But examples like this show that C is not much better either.

colanderman · on Jan 24, 2013

Dude, you don't know what you're talking about. Compile this for 32-bit x86 (gcc -m32 -O3 -S):

uint64_t foo(uint64_t x, uint64_t y) { return (uint64_t) x * (uint64_t) y; }

Big, ugly, as expected: 3 multiplies and 2 adds. Now compile this:

uint64_t foo(uint32_t x, uint32_t y) { return (uint64_t) x * (uint64_t) y; }

One instruction (mull). GCC totally knows about 32x32 -> 64 multiplies; you just don't know how to use them.

zvrba · on Jan 24, 2013

Dude, YOU have no idea what you're talking about. I'm talking about C, the language, not what one particular implementation gives you. AFAIK, C11 doesn't have a 128-bit int type, so how do you get the full result of 64x64 bit multiply? How do you access the carry flag? The overflow flag? Etc.

colanderman · on Jan 25, 2013

The C language doesn't, and has never, said anything about what the assembled code should look like. If you want performance, you have to use a good compiler. That's true with any language.

You're right, C11 doesn't have a 128-bit integer type; that should be remedied. Fortunately all major compilers (GCC, LLVM, VC++, and ICC) provide one.

Carry: "if (x + y < x)" compiles to code that uses the carry flag.

Overflow: "if ((int32_t) (x + y) < 0)" compiles to code that uses the overflow flag.

And the best part is, code written like this is portable to architectures such as TileGX that don't have status flags.

zvrba · on Jan 29, 2013

> Carry: "if (x + y < x)" compiles to code that uses the carry flag.

If x and y are signed, you're relying on UB, which the compiler might optimize out.

> "if ((int32_t) (x + y) < 0)" compiles to code that uses the overflow flag.

Again, which type are x and y? If signed, this code is UB. If unsigned, then the result of conversion to int32 may not fit into int32, in which case the result is implementation-defined, or and implementation-defined signal is raised.

Portable? Right.

stuffihavemade · on Jan 24, 2013

Does the behavior change if you take out the even condition, or if you make the algorithm tail recursive?

TwistedWave · on Jan 24, 2013

It boils down to:

> (* 4294967296 4294967296)

$1 = 0

My guess is that guile uses the native int type to store small numbers, and then switches to a bignums, backed by GNU MP when the native ints are too small. Maybe in that particular case, the check fails, and the computation overflows the native int.

alpb · on Jan 24, 2013

Then why 2^488 works correctly but nothing in 2^(64..487) ?

deltasquared · on Jan 24, 2013

Looks to me like an underlying overflow bit isn't being checked properly in some cases.

deltasquared · on Jan 24, 2013

Looking at guile version 2.0 this behaviour is fixed.

TwistedWave · on Jan 24, 2013

I could reproduce it in guile 2.0.6

deltasquared · on Jan 24, 2013

What version of C compiler are you using?