
Just fixed a 20-year-old bug… - swah
http://www.azulsystems.com/blog/cliff/2011-08-28-just-fixed-a-20-year-old-bug
======
psykotic

        // Slimey cheap key comparator.
        int32 cmpkey(const void *key1, const void *key2) {
            return (int32)((intptr_t)key1 - (intptr_t)key2);
        }
    

Any C programmer's spidey sense should tingle when they see a ternary
comparison function implemented like this. Let's look at the canonical
example:

    
    
        int intcmp(int x, int y) { return x - y; }
    

It seems perfectly fine. After all, cmp functions are usually allowed to
return any signed number, not only -1, 0 or +1. But what happens if x is
INT_MAX and y is INT_MIN? You get signed integer overflow. That's bad. In
theory, it's really, really bad because signed integer overflow in C is
undefined, so it could launch nuclear missiles, segfault, or give a compiler
like GCC total license to fuck with your code in all kinds of ways that are
hard to imagine. In practice, on 2's complement machines, it's just really bad
since INT_MAX - INT_MIN = -1 doesn't have the right sign.

Returning to Cliff's code, since he's only using the cmp function for equality
and not ordering, the overflow problem doesn't manifest itself. If he were
using it for ordering in a sorted array or search tree, you might think that
this wouldn't cause a bug as long as all he needed was a consistent ordering
rather than any particular one. Well, you'd be wrong, because intcmp violates
transitivity. Mathematically speaking, INT_MIN+1 < 0 < INT_MAX. According to
intcmp, INT_MIN+1 < 0 and 0 < INT_MAX but INT_MIN+1 > INT_MAX. This breaks the
invariants of your sorting and search tree code, so all bets are off.

The solution? Do the simple thing:

    
    
        int intcmp(int x, int y) { return x < y ? -1 : x > y ? +1 : 0; }
    

Your compiler can turn this into a few branchless instructions with a single
CMP.

~~~
stephenhalter
x = INT_MAX & y = -1 is enough to break your intcmp example. There are a wide
range of inputs that would cause a problem.

~~~
psykotic
Yes, it's as you say.

Signed arithmetic is a minefield in C. Here is a case that also explains why
my intransitivity example for simplicity used INT_MIN+1 rather than INT_MIN:

    
    
        // buffer size must be >= ceil(log10(INT_MAX)) + 2
        void itoa(int x, char *buf)
        {
            if (x == 0) { *buf++ = '0'; *buf = 0; return; }
            if (x < 0) { *buf++ = '-'; x = -x; }
            // handle the x > 0 case.
        }
    

This seems like a solid algorithmic strategy for this problem: reduce to the
positive case. Unfortunately things go very badly if x is INT_MIN. You get
another signed integer overflow which in theory is wholly undefined by C. But
in practice, with 2's complement arithmetic, you get x == -x. (As a corollary,
the antisymmetry property intcmp(x, y) = -intcmp(y, x) fails when e.g. x =
INT_MIN and y = 0, so we lose both transitivity _and_ antisymmetry.)

Anyway, the itoa strategy is great on an algorithmic level but the
implementation fails due to messy reality. The proper implementation uses a
cast to an unsigned variable without any explicit negation. This obviously
works on 2's complement machines, and Steele and Harbison's C - A Reference
Manual, pp. 190 suggests it is well-defined and portable:

"Except for the type _Bool, the general rule for converting from one integer
type to another is that the mathematical value of the result should equal the
original mathematical value if that is possible. For example, if an unsigned
integer has the value 15 and this value is to be converted to a signed type,
the resulting signed value should be 15 also. If it is not possible to
represent the original value of an object of the new type, then there are two
cases.

If the result type is a signed type, then the conversion is considered to have
overflowed and the result value is technically not defined. If the result type
is an unsigned type, then the result must be that unique value of the result
type that is equal (congruent) mod 2^n to the original value, where n is equal
to the number of bits used in the representation of the result type. If signed
integers are represented using twos-complement notation, then no change of
representation is necessary when converting between signed and unsigned
integers of the same size. However, if signed integers are represented in some
other way, such as with ones-complement or sign-magnitude representation, then
a change of representation will be necessary."

Unsigned arithmetic has well-defined semantics in C (wrap-around on overflow,
etc). Unfortunately it has other legacy issues with things like mixed
signed/unsigned comparison: if x is an unsigned int then x > -1 is always
false rather than being always true as you'd expect. In practice I use signed
integers for everything, even variables subject to non-negativity invariants,
except when I need to rely on wrap-around or bitwise operations.

In summary, all this stuff is hard to get right. It's silly that I have to
carry around all this esoterica in my head to write correct code. It doesn't
help that compilers like GCC are increasingly taking advantage of
theoretically undefined behavior in people's code to create (often very
marginal) opportunities for optimization; if you want to be scared shitless,
read this presentation:
[https://www.securecoding.cert.org/confluence/download/attach...](https://www.securecoding.cert.org/confluence/download/attachments/40402999/Dangerous+Optimizations.pdf)

------
swah
Checked out openjdk just to see that file, I thought it would be interesting
to run cloc on hotspot:

    
    
        hotspot $ cloc .
            3742 text files.
            3663 unique files.                                          
             343 files ignored.
    
        http://cloc.sourceforge.net v 1.53  T=28.0 s (120.9 files/s, 37436.2 lines/s)
        --------------------------------------------------------------------------------
        Language                      files          blank        comment           code
        --------------------------------------------------------------------------------
        C++                             712          67857         102162         365876
        Java                           1556          25399          54259         188050
        C/C++ Header                    852          29418          52451         112257
        XML                              77            504            231          15759
        C                                20           1494           2527           8430
        XSLT                              7            564            166           4392
        Assembly                          6             91             36           2256
        HTML                              7            230              3           2010
        Bourne Shell                     75            497           1800           1767
        make                             20            387            841           1579
        Javascript                        2            130            188            917
        D                                 4             83            127            869
        SKILL                             3             74              0            526
        DOS Batch                        42            176           1025            423
        ASP.Net                           1             22              0            141
        Bourne Again Shell                1             27             62            123
        Teamcenter def                    1              0              0              7
        --------------------------------------------------------------------------------
        SUM:                           3386         126953         215878         705382
        --------------------------------------------------------------------------------

~~~
notJim
A couple languages I was unfamiliar with:

\- SKILL: I'm guessing <http://en.wikipedia.org/wiki/Cadence_SKILL>

\- Teamcenter def: Guessing <http://en.wikipedia.org/wiki/Teamcenter>

------
wccrawford
The code was written 20 years ago, but the -bug- didn't appear until people
started writing 64-bit software.

I wouldn't call that a 20-year-old bug... It didn't cause any issues for most
of that time because it couldn't have.

~~~
cperciva
It was a bug. It should have had

    
    
        assert(sizeof(void *) <= sizeof(int32));
    

(or better yet, a compile-time assertion).

~~~
alttab
Can you imagine that "20 years ago" (for most of us is too early to be
programming at this level) you missed this thinking it couldn't possibly be
anything but 32 bits for the purpose it was written for?

That said, never assume, because "dict.cpp" stuff happens all the time, except
its modern web-incantation is usually called "copy-pasta."

~~~
zwischenzug
Only if you'd been raised on Java.

There had been the shift from 16->32 bit not long before that, so it's not
that much of a stretch, and 8->16 before that (well after I got my first
computer).

We found an almost identical bit-mismatch bug in a programming language that
led a bitmap of tcp connections to appear to go crazy on a customer install in
a core piece of software. These bugs cost.

~~~
tsotha
Yeah, in that time period you couldn't really get away from writing really
defensively. It was even worse, earlier. When I wrote C code for x86 there
were all sorts of different memory models you could choose from at
compilation, and we were producing both a 16 bit and a 32 bit (for the daring)
versions of our product. You couldn't assume _anything_.

I write java now and this is not something I miss.

------
viraptor
I'm a bit confused... why would that function be faster than `return key1 !=
key2` ?

Not sure how to compile this function to a 16b target, otherwise I'd check
myself :(

~~~
andrewf
Typically, at the assembly level, doing a "compare" will change the contents
of some CPU flags, but not registers. Normally, those CPU flags would then
influence future instructions - most often a "GOTO if this flag is set"
instruction.

In this case, you want to return the result of the comparison from a function.
Since functions return values in registers, you'd need to do some extra work
to copy the result into a register from the flag.

Between compiler optimisations, and the cost of your function call in the
grand scheme of things, the difference is likely negligible. This may, or may
not, have been the case in 1991. Also, casting to an integer pointer rather
than a char pointer means the code has to divide by 4 at least once. (He _did_
write this "optimisation" as an undergraduate!)

EDIT: Actual assembly comparison, from gcc -O3 (Apple's 4.2.x). subl
(subtract) and cmpl (compare) take the same amount of time. But then _compare
has to do setne to to store the "not equal" flag into a register, then movzbl
to expand that result from 8 bits (which is what setne writes) to the full
size of an int.

    
    
      _subtract:  // return (int)((char*)key1 - (char*)key2))
        pushl	%ebp
        movl	%esp, %ebp
        movl	8(%ebp), %eax
        subl	12(%ebp), %eax
        leave
        ret
    
    
      _compare:  // return key1 != key2
        pushl	%ebp
        movl	%esp, %ebp
        movl	12(%ebp), %eax
        cmpl	%eax, 8(%ebp)
        setne	%al
        movzbl	%al, %eax
        leave
        ret

~~~
viraptor
Strange, new gcc (4.4.4) produces a much nicer assembly for key1!=key2:

    
    
        .LFB0:
            .cfi_startproc
            xorl    %eax, %eax
            cmpq    %rsi, %rdi
            setne   %al
            ret
            .cfi_endproc
    

Clang 2.8 does something similar to your _compare (I expect new versions to be
a bit better)

~~~
ori_b
You're comparing two different assembly languages and calling conventions. x64
!= x86.

~~~
viraptor
You're right. It's been some time since I've looked at any intel-like assembly
:)

------
cpr
Boy, you'd think anything with an (int32) cast would get seriously eagle-eyed
during a 64-bit port...

~~~
pagekalisedown
You'd think the first thing someone would do is grep for all typecasts. hmm

One thing you can do to make your typecasts easier to find, assuming you're
using C++, is using static_cast. I know it's ugly, but searching for
static_cast is much easier than all possible ways you can write a typecast.

------
zvrba
Is it just me who thinks that the OpenJDK's "fix" still has the same bug as
the original code?

------
mypov
So Dr. Click is a mortal after all. He still rocks and his work and thoughts
are hugely informative and influential. Kudos for the upfront self crit.

