
Subtraction is not comparison - pdw
http://www.tedunangst.com/flak/post/subtraction-is-not-comparison
======
ufo
Subtraction _is_ comparison at the assembly level though. C sadly doesn't give
any way to interact with the overflow flags though...

~~~
haberman
When something is easy at the hardware level but there is no easy way to
express it in programs, it suggests a failure of the abstraction. Or more
positively, room for improvement. I'd love to see some way that this can be
expressed elegantly in C.

~~~
psyklic
The way that C expresses it elegantly is via the comparison operator. The
author is concerned that people try to be clever and use subtraction instead.
Here is how the author corrected this error by using comparison operators
instead of subtracting (in the tree man page):

    
    
        int
        intcmp(struct node *e1, struct node *e2)
        {
      -	  return (e1->i - e2->i);
      +	  return (e1->i < e2->i ? -1 : e1->i > e2->i);
        }
    

The ternary operator is necessary, because in C a relational operator is
evaluated to 0 (false) or 1 (true). So, the author added a special case for
returning a negative value (indicating the first argument should be sorted
before the second).

~~~
caf
An alternative without the ternary is

    
    
      return (e1->i > e2->i) - (e1->i < e2->i);

~~~
xmodem
My C's a little rusty, but isn't the exact integer returned by the comparison
undefined? Isn't is just defined as just zero or non-zero?

Therefore i would argue this is just as risky as the original solution, and
that it's better to use the ternary operator

~~~
caf
No, the relational operators (<, >, <= and >=) are defined to evaluate to
either 0 or 1, and the result has type int.

The same is true of the equality operators (= and !=), the logical negation
operator (!), and the logical boolean operators (&& and ||).

There are other C idioms based on this, like !!expr to squash a zero-or-non-
zero expression down to zero-or-one.

~~~
mehrdada
Regardless of the correctness of the code, if it's not obvious and you need to
explain it, you probably should not write it (unless you have profile data
that makes you do otherwise).

 _" Programs must be written for people to read, and only incidentally for
machines to execute."_

~~~
caf
I think that particular example should be obvious to anyone whose C isn't "a
little rusty".

------
ambrop7
Just do (x>y)-(x<y). It's actually pretty fast:
[http://stackoverflow.com/questions/10996418/efficient-
intege...](http://stackoverflow.com/questions/10996418/efficient-integer-
compare-function/10997428#10997428)

Disclaimer: not actually my original invention.

~~~
pkhuong
That's a good one for integers and if you don't intend the comparator to be
inlined. For inlined comparison, I prefer if (x == y) return 0; return (x < y)
? -1 : 1;. The control flow is more transparent and C compilers are usually
able to perform the if/if conversion necessary to get the single comparison
you'd expect in hand-rolled sorts or searches.

------
Garlef
From a category theoretic point of view, subtraction is a natural candidate
for comparison: It is the internal hom-functor in the ordered monoid of
integers. One could argue that internal hom functors are the very essence of a
"comparison".

The sole problem seems to be that integers in C are not actually integers.

~~~
emmelaich
I think you're right but it goes deeper. Integers as represented in computers
are not integers.

------
TwoBit
This isn't a problem with comparison by subtraction, it's a problem with
integer overflow. There's nothing wrong with comparison via subtraction if you
account for overflow, which you always need to do in programming.

~~~
marvy
But the only way to account for overflow is to avoid comparison by
subtraction.

~~~
arielby
You can also do the comparison over larger integers, for example

    
    
      int cmp(int x, int y) {
        return (int) (((long)x-(long)y)>>32);
      }

~~~
Sharlin
In mainstream compilers, even on x64, sizeof(long) == sizeof(int). You need
long long instead. Of course, in the general case there's exactly zero
guarantee that there even exists an integral type wider than int.

~~~
TwoBit
The C99 language standard requires support for long long.

~~~
dezgeg
But an implementation where sizeof(int) == sizeof(long) == sizeof(long long)
is allowed by the specification.

~~~
marvy
On a (slightly) more practical level: there exists some largest integer type.
You may someday need to sort it. If you want your sort to always work, you
can't afford to overflow.

------
salmonellaeater
One way to reduce the chance of this problem occurring is to change the
specification of the comparison function to only allow -1, 0, and 1 to be
returned (or even better, use enums like LESS, EQUAL, GREATER). This makes
doing the wrong thing more cumbersome when implementing a comparison function,
but it also makes it easier to implement the actual sort function because jump
tables and switch statements become possibilities.

------
marcosdumay
Nobody handles overflow in practice.

There are all kinds of code patterns that are sensitive to overflow, and
people do use those patterns because they are simple. A program that does not
fail on overflow is something almost illegible, and for most applications the
only difference it can make is displaying the correct error message before
closing, because the problem domain has no procedure for that.

C programmers are implicitly expected to estimate the value ranges they'll be
dealing with, and correctly size their variables.

------
gonzo
Someone needs to explain to Teddy how the compiler implements subtraction.

Hint: his example is fine (and thus, Ted is wrong).

But, since Ted is OpenBSD.. here come the downvotes.

As the poster below states, the example in the last link of Ted's posting has
overflow problems.

~~~
cremno
You should read the post again and then open the last link. Then tell us again
that the example is fine.

~~~
gonzo
Last link is just someone being stupid.

~~~
cremno
Then why isn't his example stupid, but fine? They're basically doing the same
bad thing - subtracting two signed integers without caring about possible
overflow.

------
industriousthou
I'm learning javascript, so this sort of issue doesn't really make sense to
me, but I gather this is a specific issue with C and it has to do with how
large integers are handled. Fair enough, but I'm actually quite curious how
you should solve this kind of problem in a low level language and why.

~~~
SamReidHughes
You just write a function that returns -1, 0, or 1. It's not specific to C,
it's in any language that has fixed-size types.

Another reason (a lame one) to return -1, 0, or 1 is that often the _caller_
expects one of those return values for some reason (because the caller is dumb
and your comparison function is replacing another one that behaved that way).

------
seivadmas
WTF? The article states:

    
    
      x = 1987654321 and y = -1987654321? Then the difference between them is -319658654 (negative) which proves that x is less than y. That’s less than correct.
    

Which is completely 100% wrong. Surely the difference would be x - y i.e
(1987654321 - (-1987654321)) = (1987654321 + 1987654321) = 3975308642.

Which is perfectly ok, because that's positive and so proves x is greater than
y. So this comparison works just fine for negative integers...

~~~
imron
The writer assumes that the reader understands and knows about how integer
overflows work in C.

~~~
bluedino
Exactly, don't use subtraction for comparison _in C_.

The previous commenter may have entered that into say, a Python console where
it wouldn't exhibit that behavior.

~~~
seivadmas
Ah I see the problem. Actually I used a Ruby console.

So what then is the CORRECT way of doing this comparison in C, avoiding
potential overflow pitfalls?

~~~
cygx

        return (x > y) - (x < y);

~~~
vardump
That can be very expensive operation. Potentially two branches, not counting
return from subroutine.

~~~
cremno
If performance really is of concern, qsort() or similar functions probably
shouldn't be used. Instead a dedicated function, which allows to choose a
specific algorithm (qsort() doesn't have to use quicksort) and also doesn't
involve calling a comparison function pointed to by a function pointer, can be
used.

~~~
bluecalm
One interesting thing I have learned recently is that GCC can do inlining
through function pointers. That doesn't make your point about dedicated
function being better idea for performance but it's one thing "std::sort is
faster by design" people often miss.

From GCC documentation:

>>-findirect-inlining Inline also indirect calls that are discovered to be
known at compile time thanks to previous inlining. This option has any effect
only when inlining itself is turned on by the -finline-functions or -finline-
small-functions options.

    
    
        Enabled at level -O2.

~~~
cremno
But GCC likely isn't able to do that for libc functions like qsort(). Maybe if
LTO is enabled and libc is linked statically, it might.

~~~
bluecalm
I have no idea when it can and when it can't do that to be honest. I tested
qsort vs std::sort on my machine on my data and performance was the same
(Windows, MinGW, GCC 4.8, -flto enabled) but other people reported different
results.

