Hacker News new | past | comments | ask | show | jobs | submit login
Should I Use Signed or Unsigned Ints? (robertelder.org)
78 points by dkarapetyan on Aug 1, 2015 | hide | past | web | favorite | 68 comments

As I learned from Haskell, the correct answer really ought to depend on the domain. I ought to be able to use an unsigned integer to represent things like length, for instance.

However, since apparently we've collectively decided that we're always operating in ring 256/65,536/etc. instead of the real world where we only operate there in rare exceptional circumstances [1], instead of an exception being generated when we under or overflow, which is almost always what we actually want to have happen, the numbers just happily march along, underflowing or overflowing their way to complete gibberish with nary a care in the world.

Consequently the answer is "signed unless you have a good reason to need unsigned", because at least then you are more likely to be able to detect an error condition. I'd like to be able to say "always" detect an error condition, but, alas, we threw that away decades ago. "Hooray" for efficiency over correctness!

[1]: Since this seems to come up whenever I fail to mention this, bear in mind that being able to name the exceptional circumstances does not make those circumstances any less exceptional. Grep for all arithmetic in your choice of program and the vast, vast majority are not deliberately using overflow behavior for some effect. Even in those programs that do use it for hashing or encryption or something you'll find the vast, vast bulk of arithmetic is not. The exceptions leap to mind precisely because, as exceptions, they are memorable.

If your domain is 0..99 for example, using unsigned and <100 is nicer than using signed and two conditionals. Detecting overflows is easier, not harder.

In c, unsigned behavior is also less UB.

And while the enforcement isn't there, it is still better to communicate your intent with the more precise type.

> the correct answer really ought to depend on the domain

This is correct. However, if you use C, your domain is bits and bytes, not abstract things — because this is what C is created to work with. I love using appropriate abstraction levels for different domains, but if your actual problem has nothing to do with systems programming, you probably shouldn't use C in the first place.

In theory, I agree. I mean that seriously, not as a rhetorical sop. However, in practice, it had wrought endless havoc even in C, suggesting, well, exactly my current feelings, which is that it's really insane in 2015 to use C for much of anything. C is the modern-day programmer's SCP-572 [1], except the sharp blow to the head doesn't usually work on a C programmer.

[1]: http://www.scp-wiki.net/scp-572

We have not "decided that we're always operating in ring 256/65,536/etc". We decided that being a little more efficient can save a room in the computer's building. If correctness becomes harder, well, there's an acceptable price. The programmer is always in the building anyway.

An obsolete decision, sure, but I wouldn't call it incorrect.

This begs the question, though, why don't we revert it in new architectures, already incompatible with everything. Tradition, I guess?

Overflow checking all arithmetic is still going to be quite expensive, even with hardware support, because you have to choose between:

A) exception/panic on overflow: this is preferable to overflow but also undesirable behavior.

B) arbitrary precision integers: this is nicer, but even today significantly more expensive than ordinary integers.

> A) exception/panic on overflow: this is preferable to overflow but also undesirable behavior.

I don't find this undesirable at all. I have worked in languages where this is the default (Standard ML) and it's quite nice. You can always explicitly use a word-type if you actually want overflow. With hardware support, it is even fairly cheap to implement, as you just connect the overflow bit of your integer ALUs to something interrupt-triggering.

> With hardware support, it is even fairly cheap to implement, as you just connect the overflow bit of your integer ALUs to something interrupt-triggering.

You can exploit the IEEE-754 inexactness exception with doubles if you're okay with 53 bits of integer precision. Tim Sweeney showed me this trick a long time ago, which he combined with run-time code patching so that an arithmetic operation would be speculatively compiled as either fixnum or bignum based on observed run-time behavior. It'd compile it to fixnum by default with an overflow trap using the FP inexact exception. The trap would patch the fixnum site with the equivalent bignum code and resume. With hindsight inspired by modern trace compilers like LuaJIT, the better approach is probably to compile entire traces speculated on either case, so you can hoist most of the guard predicates and not speculate based solely on the proximate call site but on longer traces.

I can't get past the hubris involved with this post. First, did you truly need Haskell to learn that number representations have pitfalls? Look back in history. Problems with numbers did not begin with computers. Nor will they end with them.

Second, you imply that this would have been an easily solvable problem. If only those that came before you hadn't been so willing to shout "Hooray" and make a choice you disagree with.

I don't think you have to have respect for those that came before you. But blithely ignoring their context when they made these choices is... rude. At best. (It also ignores that many early computers did not use 2's compliment numbers, so it was possible to keep adding to bigger and bigger numbers, you would just run out of space and cause other problems.)

Finally, over/under flow are pretty much endemic into all physical things. Ever accidentally pour too much water into a cup? Why didn't you pick the correct cup for the domain of liquid you were dealing with? Overfill a laundry machine? Run out of ink in the pen you were using? Reach for a sheet of paper to find you had none left? ...

Complete newbie programmer naivete, I'm afraid.

Unsigned integers have a big "cliff" immediately to the left of zero.

Its behavior is not undefined, but it is not well-defined either. For instance, subtracting one from 0U produces the value UINT_MAX. This value is implementation-defined. It's safe in that the machine won't catch on fire, but what good is that if the program isn't prepared to deal with the sudden jump to a large value?

Suppose that x and y are small values, in a small range confined reasonably close to zero. (Say, their decimal representation is at most three or four digits.) And suppose you know that x < y.

If x and y signed, then you know that, for instance, x - 1 < y. If you have an expression like x < y + b in the program, you can happily change it, algebraically to x - b < y if you know that overflow isn't taking place, which you often do if you have assurance that these are smallish values.

If they are unsigned, you cannot do this.

In the absence of overflow, which happens away from zero, signed integers behave like ordinary mathematical integers. Unsigned integers do not.

Check this out: downward counting loop:

  for (unsigned i = n - 1; i >= 0; i--)
  { /* Oops! Infinite! */ }
Change to signed, fixed! Hopefully as a result of a compiler warning that the loop guard expression is always true due to the type.

Even worse are mixtures of signed and unsigned operands in expressions; luckily, C compilers tend to have reasonably decent warnings about that.

Unsigned integers are a tool. They handle specific jobs. They are not suitable as the reach-for all purpose integer.

> Suppose that x and y are small values, in a small range confined reasonably close to zero. (Say, their decimal representation is at most three or four digits.)

There's the rub! You need to justify characterizing your values in that way, which means either explicit range checks and assertions, or otherwise deriving them from something that applies that guarantee in turn. And that justification is more work than just making your code correct for every value. I mean, if x and y are both smallish, is x*y also smallish?

The "nearby cliff" is a good thing, in that it makes errors come out during testing rather than a month after you ship. Handwaving about "reasonably close to zero" is begging for trouble.

In the absence of automatic bigints, unsigned integers are easier to make correct.

What bugs have you seen that were caused in part by the choice to use signed integers?

The most dramatic example is the INT_MIN/-1 case, since that causes an outright crash.

For example, Windows has the ScaleWindowExtEx function, which scales a window by some rational number, expressed as the ratio of two ints. Using signed arithmetic is already suspicious: what does it mean to scale a window by a negative fraction? But of course they forgot about the INT_MIN/-1 case, and the result is a BSOD. http://sysmagazine.com/posts/179543/

http://kqueue.org/blog/2012/12/31/idiv-dos/ has some others. Fun stuff.

Every single one of those cases involve integers that necessarily must be signed, in order for the interface to work. They're implementing an interpreted language having signed integers, or in the one other case, with ScaleWindowExtEx, it manipulates values that are signed, and that have meaning when negative. The one place I saw an INT_MIN / -1 bug (in code review, iirc) was also for an interpreted language implementing a modulo operator. These are bugs with signed integer usage, but they're not bugs caused by a decision to use signed integers, because in these cases, there was no choice. They aren't representative of what you see when you do have a choice, say, using signed integers for array indices and sizes.

The question of what bugs you actually see was meant to be a personal one, not one of some dramatic bug you've read about on the internet. The answer of which is the better choice is defined by how much damage is caused by one choice versus the other, and you get that answer by noting how frequently you get bugs in practice as a result of such decisions. (Not that thought experiments and imagination no place, but this is clearly a question for which you can talk yourself into any direction.) For example, I've never had problems with C/C++ undefined behavior of signed integer overflow, while you're spending a lot of time talking about that. I have seen bugs caused by unsigned and signed integer usage that fit into other categories, though.

The ScaleWindowExtEx example certainly has no legitimate reason to accept signed ints.

Personally, the bug I introduce most often with signed ints is a failure to range-check for negative values, e.g.:

    void *get(int idx) { assert(idx <= arr.size(); return arr[idx]; }

The "legitimate reason" is that it's scaling a signed value, that has meaning when negative.

> void *get(int idx) { assert(idx <= arr.size(); return arr[idx]; }

You got problems there even if idx and arr.size() are unsigned.

Without signed integers, the scaling function will turn arguments like (-4, -3) to garbage.

In order to disallow negatives, you need signed arguments, or else to reduce the range. (Say the numerator and denominator cannot exceed 255 or whatever). Otherwise the function has no way of knowing whether argument values (UINT_MAX-3, UINT_MAX-2) are a mistaken aliasing of -4, -3 or deliberately chosen positive values.

Garbage in, garbage out, as they say.

For all we know the ScaleWindowExtEx did have a domain check that disallowed negatives, but it put it after the division.

It permits negatives! SetWindowExtEx permits negatives!

> but what good is that if the program isn't prepared to deal with the sudden jump to a large value

Aaaand you only need to care about this case.

For an unsigned int just check: x < (your max value)

For signed ints: x < (your max value) AND x > 0

Oh, the downward counting loop example. Because that's done very frequently no? I really don't remember when was the last time I did that (I'd much rather have an upwards loop then y = MAX_VALUE - x (adding 1 if needed)

Quite funnily, if you do a loop in x86 assembly, it is naturally downwards counting, if ECX is zero the loop ends

Don't use a for, just i = (n - 1); do { i--;} while (i);

> for (unsigned i = n - 1; i >= 0; i--)

An defined behavior alternative:

  for( size_t i = n; i --> 0; )

size_t is also unsigned (no idea why). The signed equivalent is ssize_t.

Edit: Sorry, missed the "i-- > 0" at first. The code works, but not because of changing "unsigned" to "size_t".

Sizes can be >2gb on a 32 bit system. Not sure how ssize_t works, is it 64 bits then?

It makes sense to use unsigned for sizes to save a bit (or 32 bits per size). Also less invalid possible inputs to handle.

No worries. I nearly left it "unsigned", but my OCD kicked in. Too many 64-bit conversion warnings stain my psyche...

By the way, ssize_t is POSIX, from <sys/types.h>, not ISO C.

Unsigned overflow is defined but that doesn't necessarily make it any more expected when it happens. It can still ruin your day, and in fact it happens much more often than signed overflow, because it can happen with seemingly innocent subtraction of small values. After personally fixing several unsigned overflow bugs in the past few months I'm going to have to side with the Google style guide on this one.

If you underflow an unsigned you get a huge value that's going to be invalid by any range check unless you assume entire unsigned range is available.

But if you use signed, you have to check for values under zero AND the top of the range. So I don't understand how the bugs were caused by unsigned overflow. Wouldn't they still be bugs with negative results in signed numbers?

Mostly the errors are expecting that a calculation is using signed arithmetic when it's not, and getting a large positive value where you expected a small negative value. It's easy to forget the signedness of a variable, but the biggest problem is probably C's unfortunate tendency to silently convert signed to unsigned when at least one unsigned value is involved in a calculation. This causes unsigned arithmetic to contaminate parts of your code that you thought were signed.

I think I have a handful of negative numbers in hundreds of thousands of lines of C code. It's quite rare, IME.

By the way, an underflowed/large unsigned number used when a signed number is expected in most contexts (e.g: when adding as an offset) will behave correctly. It will fail when you try to compare it using < or >. If you use enable gcc warnings (which you ought to for every conceivably useful warnings :-) ), that will be caught.

This is good general advice.

A key difference is whether the type is meant to be an index (counting) or for arithmetic. Indexing with unsigned integers certainly has its pitfalls:

    while (idx >= 0) { foo(arr[idx]); idx--; }
But this is outweighed by the enormous and under-appreciated dangers of signed arithmetic!

Try writing C functions add, subtract, multiply, and divide, that do anything on overflow except UB. It's trivial with unsigned, but wretched and horrible with signed types. And real software like PostgreSQL gets it wrong, with crashy consequences: http://kqueue.org/blog/2012/12/31/idiv-dos/#sql

> Try writing C functions add, subtract, multiply, and divide, that do anything on overflow except UB.

For those curious...


Well, there are builtin functions that do this safely for signed integers.

Of course that means you need to be aware which calculations are potentially dangerous, you need to know about signed integer overflow problems in the first place, etc. ..

What functions are those?

I totally agree.

Despite the fact that signed overflow/underflow is undefined behavior I'm pretty sure that many more bugs have resulted from unsigned underflow than signed overflow or underflow. When working with integers you're usually working with numbers relatively close to zero so it's very easy to unintentionally cross that 0 barrier. With signed integers it is much more difficult to reach those overflow/underflow limits.

But you have to balance the probability of a problem with its severity. I do a lot of work in C with pointers and arrays, and there are many situations where I select unsigned ints because (, although problems are more likely,) there are less likely to cause a memory read or write error (for my application, though not necessarily for others). I also often use unsigned when I am doing math that includes logs, because I am less likely to cause a math error.

I would like to be clear that I agree with your assessment for most situations, with some exceptions.

There are a lot of corner cases involved here. The corner cases for signed integers involve undefined behavior, the corner cases for unsigned integers involve overflow. Any time you mix them you get unsigned integers, which can give you big surprises.

For example, if z is unsigned, then both x and y will get converted to unsigned as well. This can cause surprises when you expected the LHS to be negative, but it's not, because the right side is unsigned, and that contaminates the left side.

    if (x < y * z) { ... }
This particular case gets caught by -Wall, but there are plenty of cases where unintended unsigned contagion doesn't caught by the compiler. Of course, if you make x long, then y * z will be unsigned, then widened to long, which gives you different results if you are on 32-bit or if you are on Windows. Using signed integers everywhere reduces the cognitive load here, although if you are paranoid, you need to do overflow checking which is going to be a bear and you might want to switch to a language with bigints or checked arithmetic.

As another point, in the following statement, the compiler is allowed to assume that the loop terminates:

    for (i = 0; i <= N; i++) { ... }
Yes, even if N is INT_MAX. The way I think of it, your use of "signed" means that you are communicating (better yet, promising) to the compiler that you believe overflow will not occur. In these cases, the compiler will usually "do the right thing" when optimizing loops like this, where it sometimes can't do that optimization for unsigned loop variables.

So I'm going to disagree as a point of style. Signed arithmetic avoids unintended consequences for comparisons and arithmetic, and enables better loop optimizations by the compiler. In my experience, this is usually correct, and it is rare that I actually want numbers to overflow and do something with them afterwards.

All said, if you don't quite get the subtleties of arithmetic in C (yes, it is subtle) then your C code is fairly likely to have errors, and no style guide is going to be a panacea.

  In fact, prior to Java 8, you could not even declare unsigned integers.
In Java 8 you still can't declare unsigned integers. It just that API have been extended to provide operations that treat an int or a long as having unsigned value.

  This seems to imply that Java leaves the behaviour of signed integer overflow
  up to the underlying hardware, but guarantees a two's complement
  representation even if the architecture does not.  It seems reasonable to
  expect that it would simply wrap to 0 (I wasn't able to find a more
  conclusive reference for this).
In fact Java Specification does specify what happens in case of overflow very precisely. For example for multiplication (section 15.17.1):

  If an integer multiplication overflows, then the result is the low-order bits
  of the mathematical product as represented in some sufficiently large
  two's-complement format.
And for addition (section 15.18.2):

  If an integer addition overflows, then the result is the low-order bits of the
  mathematical sum as represented in some sufficiently large two's-complement

Java approach without undefined behaviour would seem to be quite convenient for a programmer, but it does not seem to be the case in practice. Of course you gain ability to check for overflow after performing the operation, which is nice, but at the same time loose the straightforward way to detect bugs provided by undefined behaviour (UB).

If an application executes UB, then it is obviously wrong, thus a simple instrumentation of arithmetic operation followed by a check for overflow gives a way to detect such problems without false positives (see for example -fsanitize=undefined and similar options available in modern C/C++ compilers).

IMHO only very small fraction of overflow bugs would have been fixed by just wrapping result around.

Of course you gain ability to check for overflow after performing the operation,

Using exceptions?

IMHO only very small fraction of overflow bugs would have been fixed by just wrapping result around.

Yes, switching to wrapping integers is not the solution to overflow. Rather check the operation beforehand. I still think using unsigned integers is always preferred if you don't need negative values.

What I had in mind, is the fact that with defined overflow you could unconditionally perform the operation, and then observe the result to decide if overflow have in fact happened. For example, following pattern to check if adding 100 to an integer 'a' overflows would be valid:

  int a = ...;
  if (a + 100 < a) overflow
So this is one of cases where wrapping behaviour could potentially fix bugs in existing applications. Obviously, rewriting this to work with overflow is trivial if you are already aware of undefined behaviour that could occur.

How would that fix existing code without modifying it?

Suppose somebody have written above code. Then under current C++ standard it would invoke UB on overflow. But, if standard were to change and require wrapping behaviour for int, then it would be "fixed", i.e, do what programmer intended it to do. Similarly if you use some compiler specific option to ensure wrapping behaviour for signed integers like GCC -fwrapv.

Oh, you are assuming someone actually wrote that code. That is circular reasoning.

It would be much better to write proper checks or just call functions that do so.

While the signed vs unsigned int question doesn't concern me much[1], I really appreciate this post because I discovered the One Page CPU. The author nerd-sniped me three paragraphs in. Now I want to try to implement this on an FPGA. I think my next few weekends will be taken up with getting a real world implementation working and getting some examples working. Thanks.

[1] If I was designing a system that could only have one type of int (but why?) I'd use unsigned ints and if I needed to represent negative numbers, I'd use two's complement which is fairly well behaved, much as the author points out. This is fairly common in the embedded world.

Seems like it boils down to using unsigned when you need to treat numbers as fields of bits, and signed when you need to do arithmetic.

Not experiencing undefined behavior is a desirable thing for arithmetic so unsigned has some appeal for arithmetic too, I'd say.

That seems like a false sense of safety: when doing arithmetic, your program is most likely not going to be handling overflows correctly anyway. Also, when using unsigned ints, any code involving subtractions can lead to subtle bugs that only occur when one value is larger than another. I'd recommend sticking with signed ints for all arithmetic code.

It might be pretty safe if you're only ever adding, multiplying or dividing with other unsigned numbers. Once you start doing subtraction it could get ugly.

> when doing arithmetic, your program is most likely not going to be handling overflows correctly anyway

Well with that attitude!

Funnily enough the bash crash in the comment still crashes bash on OSX 10.10.4:


Probably because Apple hasn't really updated bash since the license switch to GPLv3.

Assume sizeof(int) == sizeof(unsigned int) == 4

g << h Well defined because 2147483648 can be represented in a 32 bit unsigned int

Even under those assumptions, it is implementation defined if unsigned int can hold the value 2^31. It is perfectly valid to have an UINT_MAX value of 2^31-1. In that case the code will cause undefined behavior.

The only guarantee for unsigned int is that it's UINT_MAX value is at least 2^16-1, regardless of its bit size, and that it has at least as much value bits as a signed int.

For example C allows these ranges:

int: -2^31 , 2^31-1

unsigned int: 0 , 2^31-1

Use signed because pointer differences in C are signed:

    char *a, *b;
    size_t foo = a - b; /* generates a warning with -Wconversion */
This tells me that C's native integer type is ptrdiff_t, not size_t. (And I agree this is crazy: unsigned modular arithmetic would be fine, but they chose a signed result for pointer subtraction).

Why care about this? You should try to get a clean compile with -Wconversion, but also you should avoid adding casts all over the place (they hide potential problems). It's cleaner to wrap all instances of size_t with ptrdiff_t- you can even check for signed overflow in these cases if you are worried about it.

There is another reason: loop index code written to work properly for unsigned will work for signed, but the reverse is not true. This means you have to think about every loop if you intend to do some kind of global conversion to unsigned.

> (And I agree this is crazy: unsigned modular arithmetic would be fine, but they chose a signed result for pointer subtraction).

But. Signed means that i < j implies that p + i < p + j, or that p < q implies that p - r < q - r.

Because properly handling overflow if you're using signed values involves UB, it's super hard to get it right. Even Apple has struggled with it: http://www.slashslash.info/2014/02/undefined-behavior-and-ap... If you get it wrong you can also see your program's behavior change with the compiler optimization level, which is incredibly frustrating (and causes a lot of "the compiler is buggy!" beliefs).

For what it's worth, all the bugs I've seen related to integer representation (not just over/under, e.g. I've seen code casting a 32bit pointer to a 64bit signed integer) could have been fixed by changing int to unsigned and never the opposite. Of course, this could be just the effect of the vast majority of programmers choosing int as the default integer type.

I've seen people 10 years my senior cast a 32-bit pointer to an int32_t. It's funny because it's not difficult to get right, but nobody ever seems to bother.

This was in a game engine, written by a guy who is adored here, if I said his name, I'd probably been downvoted to negative :)

Some practical notes for handling/avoid signed integer overflow


Use the most appropriate type. If you don't know what that is, use `int`. My general rule of thumb is "use unsigned integers when interfacing with hardware, otherwise use int.". Sometimes you need int64_t. You usually don't.

The problem with unsigned integers is that most people, past me included, think that it prevents nonsense negative values. It doesn't. The compiler will quite gladly let you call a `void func(unsigned x)` with -1 and let you have "fun" later debugging why who and how this function got called with an enormous value.

And, of course, that's not the only source of negative numbers that get passed in by mistake. More often it's the result of a subtraction that can "never" be negative.

Unsigned int's for storing IPv4's FTW. Or conversely, storing IP's as signed int's will make you sad.

Use unsigned.

All of the mathematical operations are defined on unsigned int in C. This is not true of int.

Compilers are now starting to do all manner of nasty optimizations on undefined behavior. It is now only a matter of time before an signed int burns you.


No matter if you use signed or unsigned types, be sure you handle both underflow and overflow. From my experience, signed integers main usage is integer arithmetic (e.g. accounting things that can be negative or positive), pointer arithmetic, error codes, or multiplexing on same variable two different kind of elements (so if negative has one meaning and if positive, a different one). Unsigned, for handling countable elements, with 0 as minimum. The good point of using unsigned integers is that you have twice the available codes, because the extra bit.

Example for counting backwards using unsigned types (underflow check, -1 is 111...111 binary in 2's complement representation, so -1 is equivalent to the biggest unsigned number):

        size_t i = ss - 1, j = sso;
        for (; i != (size_t)-1; i--) {
                switch (s[i]) {
                case '"': j -= 6; s_memcpy6(o + j, "&quot;"); continue;
                case '&': j -= 5; s_memcpy5(o + j, "&amp;"); continue;
                case '\'': j -= 6; s_memcpy6(o + j, "&apos;"); continue;
                case '<': j -= 4; s_memcpy4(o + j, "&lt;"); continue;
                case '>': j -= 4; s_memcpy4(o + j, "&gt;"); continue;
                default: o[--j] = s[i]; continue;

Example for compute percentage on unsigned integers (same idea can be applied for signed) without losing precision nor requiring bigger data container (overflow checks):

        size_t s_size_t_pct(const size_t q, const size_t pct)
                return q > 10000 ? (q / 100) * pct :
                                   (q * pct) / 100;

Also, for cases where overflow could happen, you can handle it doing something like this:

        sbool_t s_size_t_overflow(const size_t off, const size_t inc)
                return inc > (S_SIZET_MAX - off) ? S_TRUE : S_FALSE;

        size_t s_size_t_add(size_t a, size_t b, size_t val_if_saturated)
                return s_size_t_overflow(a, b) ? val_if_saturated : a + b;

        size_t s_size_t_mul(size_t a, size_t b, size_t val_if_saturated)
                RETURN_IF(b == 0, 0);
                return a > SIZE_MAX / b ? val_if_saturated : a * b;

P.S. examples taken from:

https://github.com/faragon/libsrt/blob/master/src/senc.c (underflow example)

https://github.com/faragon/libsrt/blob/master/src/scommon.h (overflow examples)

> size_t i = ss - 1, j = sso; for (; i != (size_t)-1; i--) {

The idiomatic way is

    for (size_t i = ss, j = sso; i-- > 0; ) {

Yes, that way is for C99 and later.

Or the trivial C89 translation. Still better than having to cast -1.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact