
Should I Use Signed or Unsigned Ints? (Part 2) - robertelder
http://blog.robertelder.org/signed-or-unsigned-part-2/
======
w8rbt
One quick side story. Years ago, when I worked in automobile systems
programming, some coworkers had to bring up a CAN bus data extraction system
quickly (less than a week) that worked with all of our programming tools. We
used Java, C#, C, Python and C++ to capture, process and display the data to
end users.

To save time (due to the short schedule), they decided to store all numbers as
64 bit doubles. They reasoned that 64 bit doubles were big enough to hold any
number that might come off the system controllers (some where intel, others
motorola) and would simplify the design so that they could write it quickly.

Some time later, we discovered an infrequent bug where occasionally, some of
the CAN variables were correct, but at other times they failed sanity checks
(way outside of the expected values for the specific sensor).

Long story short, the issue turned-out to be that 64 bit double assumption.
Sometimes there was enough space to store a large 64 bit int (if the int was
52 bits or less (IIRC)), at other times the int was too large to fit. The sign
(+/-) and exponent take up 12 bits or so.

The moral of the story is to use the right size (and sign) for each variable
and don't assume that one size will fit all. We had to go back and correct all
the bad values. That took awhile. In hindsight, they should have been given
more time to design the system.

~~~
kabdib
Floating point numbers are almost always a mistake. Classic howlers:

"Well, we can represent almost all the values we care about." Until the day
you can't.

"We'll represent time with a double, with the fractional part being part of a
day, and the day being an integer day number from when Somebody Famous Did
Something." Until you realize that the 10AM meeting you want to put in your
scheduler can't be exactly represented.

"We'll put our account balances in there as floats because unlike everyone
else who has done this, _we_ know what we're doing." But you don't. And that
noise you heard was a million accountants crying out in pain.

You wouldn't represent a pointer as "some fraction of the amount of memory the
machine has, from 0.0 to 0.9999", which is my usual retort when someone wants
to smuggle a scalar or structural value in a float.

~~~
roel_v
"Floating point numbers are almost always a mistake."

This. The problem is that it's almost impossible to internalize this without
having first-hand experience - experience which usually takes years to
accumulate. By coincidence, over the last month I had to explain the problems
with floating point numbers to non-technical people on two disjoint occasions,
and failed miserably each time, because the technical details are too
complicated and the examples too obscure. The 'store account balance in
double, let's put the remainder on our own account' parable is the only
example that sort of works but then they look at you blank-faced and say
'we're not a bank'.

Just venting I guess. It's the simple things in mapping real-world things to
bits that trip me up the most - why is it so hard to represent 'numbers' in
computers? Why is it so hard to represent 'date' in computers? Why is it so
hard to represent 'locations' in computers?

~~~
jbb555
I find it interesting that people don't realize that decimal notation has
exactly the same issues as floating point.

You can't store 1/10 of a £ in a binary floating point number exactly so don't
use it for money.

Well, you can't store 1/3 of a £ in a decimal number exactly either.

The only difference is that people have got so used to decimal that we don't
even notice the issue any more, and choose prices and have systems to work
around the issue. It's so ingrained we don't even know we are doing it.

So don't think that decimals are any better at storing money amounts than
binary floating point, they are not.

 _however_ all of the history and workarounds that we've been using for many
years before computers for this issue also work if you store them in computers
the same way.

That's why you use decimals, it's because they replicate the representation
issues we are used to already so we don't even notice they are there, it's not
that they are fundementally better in any way.

~~~
roel_v
I think I disagree (depending on what you mean exactly). You say

"That's why you use decimals, it's because they replicate the representation
issues we are used to already so we don't even notice they are there, it's not
that they are fundamentally better in any way."

So decimals _are_ better, because they allow us to represent money amounts in
the exact same way we do in real life, which is what we're modeling in
software.

1/10 = 0.1, an amount you want to be able to represent in a computer. 1/3 =
0.33..., an amount (when talking about money) you don't want to represent
(when we're talking about maintaining balances).

Unless you're saying "decimal as a concept is not philosophically or morally
better than floating point", which is a position to which I suspect pretty
much everybody subscribes.

~~~
emiliobumachar
I think gp's point is that 0.33... is "an amount (when talking about money)
you don't want to represent (when we're talking about maintaining balances)"
precisely because you, and everybody else, is so used to decimal that it
disappears.

There's _nothing_ intrinsic about 1/10 that makes it "rounder" than 1/3,
except as an artifact of the numbering system.

Aliens with three hands with three fingers each, whose societies got used to
base 9, would think 1/3 looks much "rounder", and 1/10 is the one you'd never
want to represent.

~~~
roel_v
No. It's not that a gum costs 5 cents because it's the only fraction of a
dollar that can be represented in cents, it's because pricing is inherently
done in one-cent increments. If there was a need to price single items at a
lower-than-a-cent cost, or at prices between one-cent increments, _and_ if the
need for that outweighed the disadvantages, _there would be a way to do it_.

Other example: an apple doesn't cost 1/3 of a dollar, it costs 33 cents. The
need for pricing things as a fraction is simply not there. I doubt most people
would see 20 cents as '1/5 of a dollar' in the first place.

This might be a difference between people thinking in metric and thinking in
imperial, too. (I'm metric, fwiw).

(before someone says 'I buy resistors that cost less than a cent', please show
me a place that sells them individually)

------
chipsy
It really shouldn't be this false dichotomy. We could have richer
specifications for the bounds of numbers and the rules used to convert their
formats. It's some kind of failure of imagination to not have that when
arithmetic bugs traceable back to "implementation does not match
representation" are so common.

~~~
amenghra
Ada got this right. For example, you can do

    
    
      type Day_Of_Month is range 1 .. 31;

~~~
jmnicolas
What happens when you try to pass an invalid number ? An exception is thrown ?

~~~
ygra
Yes.

------
mojuba
One of the most annoying problems in C and C++ (and Objective-C/C++). Most of
the time we deal with quantities, sizes, distances, coordinates, identifiers.
Except for the latter, all others can end up in the negative space even when
we think they shouldn't. Having said that, identifiers and enumerated codes
should also live in the signed realm as they can get converted to other types,
again, in ways we can't always predict (and result in bugs).

~~~
cpeterso
I wish C++ had a better way of representing quantities, something like
Pascal's or Ada's ranged integer types. I started to write a C++ template
class to represent strongly-typed ints. The class grew to 100+ LOC to handle
all the expect operator overloads. Then I gave up. :) C++11 gave us strongly-
typed enums and the means to specify the enum's underlying size (like _enum
class Color : unsigned short_ ). Why can't they give us strongly-typed ints
with something like _int class Fahrenheit_?

------
mcshicks
One thing about two's complement representation is the representation is
asymmetric, you have one more negative number, i.e. 0x8000 for a 16 bit signed
number is negative (unsigned) 0x8000. So a function like absolute value if it
returns a signed integer cannot handle all possible input values. I think if
you do any numeric processing using fixed point arithmetic you simply need to
know the range is asymmetric for two's complement rather than trying to pick a
scheme like always using signed or unsigned.

~~~
qznc
There is no "two's complement" in C. The representation of signed integers is
implementation-defined. This is one reason why C has undefined behavior.

Imho it does not make sense anymore, since practically all hardware uses two's
complement these days. If the standard would declare two's complement for
signed integers, then shift-left would always have defined behavior, for
example.

------
amluto
> In my case, I was designing a spec for a CPU that was intended to be as
> minimal as possible. The CPU would physically implement signed or unsigned
> atomic math operations, not both. The compiler would support the missing one
> (signed or unsigned) using emulation in terms of the one that was physically
> implemented.

I find that part very strange. Here are a couple of observations:

* Unsigned wrapping binary addition and two's complement addition are identical as long as you don't try to report carry or overflow.

* Unsigned wrapping binary subtraction and two's complement subtraction are identical as long as you don't try to report borrow or underflow. If the goal is absolute simplicity, subtraction isn't needed at all because both cases are efficiently handled by two's complement negation (which is the same thing as calculating 0-x in an unsigned wrapping sense) followed by addition.

* Unsigned wrapping binary non-widening multiplication (the usual case) and two's complement multiplication are identical. This follows from the distributive law: an n-bit two's complement negative number -x is represented as 2^n - x. If you multiply -x (a two's complement number interpreted as unsigned) by y, you end up with (2^n - x) * y, which is y _2^n - x_ y. The first term doesn't affect the result, because its low bits are all zero.

Widening multiplication is extremely useful, but implementing the unsigned
version in hardware should make it fairly simple to emulate the signed
version. x86 implements both.

Division is nasty, especially because number theorists and C programmers
disagree on what it means when negative numbers are involved. Really simple
CPUs don't implement division at all, though.

Finally, CPU vendors who actually implement undefined behavior are, in my
opinion, nuts. Give the CPU a nice clean spec that fully defines what happens
and stick to it.

~~~
detrino
I found this odd too. I'd add that comparison and right shift also differ, but
signed right shift does have obvious semantics: sign extension. I'd also argue
that signed division has obvious semantics: round towards negative infinity,
which matches the semantics of right shift with sign extension. I'm saddened
that none of C/C++/x86 have native support for this kind of division, it's
_always_ what I want from signed division.

------
hyperliner
Too many problems with unsigned. Stick with signed.

[https://google-
styleguide.googlecode.com/svn/trunk/cppguide....](https://google-
styleguide.googlecode.com/svn/trunk/cppguide.html#Integer_Types)

~~~
wnoise
That's funny since signed over/underflow isn't defined, while unsigned is.

~~~
modeless
Just because it's defined didn't mean it's expected. Unsigned overflow can
happen with seemingly innocent subtraction of small values.

------
alricb
Honest question: Would it be possible to create a "friendly" version of signed
that is actually unsigned underneath? After all, it's just bit patterns.

~~~
detrino
On modern computers, both unsigned/signed ints are just bit patterns and it's
only some operations that differ. add/sub/mul/left-shift are the same, but
cmp/div/right-shift differ. Look up twos-complement representation for more
information.

~~~
Sanddancer
Left shift also differs. Raw shifting left can have the affect of turning a
negative number positive by setting the sign bit to 0, so there's a sign
maintaining left shift as well. Also, Left/Right shift of negative numbers
have the opposite effect as they do with positive, with left shifting a
negative number dividing and right shifting multiplying.

~~~
detrino
You are mistaken, left shift is the same for both signed and unsigned and any
instance of a left shift turning a signed int into a positive number is just
normal underflow.

------
dnautics
if you ever have to implement a softfloat on this desert island, you're going
to want unsigned ints.

------
simonblack
Use whichever works best for a particular variable. If you don't know, you
haven't thought about it.

