Hacker News new | past | comments | ask | show | jobs | submit login
Two's complement – You beauty (everyogi.in)
123 points by yogeswarant on Jan 11, 2017 | hide | past | web | favorite | 86 comments

This all is suddenly less surprising for people who learned some modulo arithmetics in school (or university). That is, calculating with the remainders of division. For example:

- Calculating "modulo 60" means calculating time with a round clock in mind, considering only minutes and ignoring hours (and seconds).

- Calculating with angles (in degrees) means just calculating "modulo 360".

- "modulo 1000" means calculating with the last 3 decimal digits of an integer, ignoring the front digits.

The fundamental result here is: No matter modulo which number you calculate: addition, negation, subtraction and multiplication work "out of the box". And you'll quickly notice that "two's complement" just means calculating modulo 2^n, where n=8,16,32,64 or 128. But this all really works for any m >= 2, not just m = 2^n.

(One drawback though: division doesn't work here, it works only if m is prime, and even then it is slightly different from what you'd expect, although completely logical.)

In short, the elegance comes from modulo arithmetics. It has nothing to do with "two" or "binary", it would e.g. work with 3-state logic machines the same way.

EDIT: To those who downvoted this: Do you care to elaborate? The author did't mention modulo arithmetics with a single word, although it is an essential part to truly understand how and why two's complement works.

I didn't downvote you. However, I am a math person and your post feels wrong for some reason. It isn't wrong, but it feels like it should be; and I am not entirely sure why.

CPUs use modular arithmetic for both signed (two's complement) and unsigned values, so that cannot be the defining feature of twos complement. The insight with two's complement is that we can interperet the "upper" half of numbers as negative. As you say, this should be very familar to anyone who has worked with modular arithmatic. For those who have not, this just means that (in 3-bit twos complement/ integers mod 8). We interperate 7 = 8-1 = -1 and so on. As a result of this, operations on signed and unsigned integers are literally identical. Almost.

CPUs differ from modular arithmetic in 1 way: multiplication. Specifically, CPUs do not do modular arithmetic for multiplication. However, the result of the multiplication of two n-bit numbers could be as big as 2n-bits. When CPUs do this multiplication, they store the result in two registers. If you only look at the bottom register, the result is equivalent to modular arithmatic. However, if you consider the top register, the result is not (this is why there is a MUL and IMUL instruction).

Arguably, the same thing happens for addition. However, because there is at most 1 extra bit needed, it is not given its own register, but rather, the upper digit is stored in the carry flag.

The other insight of two's complement is encoding. Under the construction I presented above, if given the 3-bit twos complement number 0b111, we would have to compute that:

    0b111 > 0b011
    0b111 = 0b1000 - 0b0001
The insight of two complement is a way performing this computation using primitive bitwise operations. Specifically:

   If the first bit is 0, we are done
   Otherwise, perform bitwise negation, add 1, and consider the result "negative". 
There is no obvious equivelent of this method for other bases.

> CPUs use modular arithmetic for both signed (two's complement) and unsigned values, so that cannot be the defining feature of twos complement

I disgree, because this still fits nicely into modulo arithmetics.

Signed versus unsigned just means that we choose a different set of representants for certain operations (such as). For unsigned, we use the smalles non-negative representant. For signed, we use the representant with the smallest absolute value (and prefer the negative one if there is a tie). Still, nothing with binary.

Except for one single detail: The "tie" is solved in favour of the negative number, because that way, the first bit always denotes the sign. This little details is binary-specific, but that's it.

> CPUs differ from modular arithmetic in 1 way: multiplication. Specifically, CPUs do not do modular arithmetic for multiplication. However, the result of the multiplication of two n-bit numbers could be as big as 2n-bits. When CPUs do this multiplication, they store the result in two registers. If you only look at the bottom register, the result is equivalent to modular arithmatic.

Good point, but in most (non-assembly) code that I see, the result of a multiplication is stored in a same-size integer. So I'd argue this is used as much. I agree that this is still a difference, though.

> The insight of two complement is a way performing this computation using primitive bitwise operations.

I believe that negation is not what this is all about. To the contrary, the negation is more complicated for two's complement than for other representations. For example, in other representatios you just flip a single bit to negate a number.

No, the point is that there no special cases for increment, decrement, addition, subtraction and multiplication (with the small difference discussed above). And there is not even a difference between signed versus unsigned arithmetics except for comparison (also discussed above). This is what works out perfectly well in modulo arithmetics, and has nothing to do with binary.

"Two's Compliment represents a linear automorphism over a cyclic group" is what I think you're trying to say. russdill & co are hinting that Modular Arithmetics are cyclic groups.

I think for most people, the aha-moment comes when they realize One's Compliment double-counts the zero. In contrast, linearity is simply assumed. Otherwise, why would anyone make a number system out of it?

  > However, the result of the multiplication of two n-bit numbers
  > could be as big as 2n-bits.
For unsigned multiplication. For signed two's complement multiplication, it's at most 2n−1 bits, and that is only when multiplying the two most negative numbers; otherwise it's at most 2n−2 bits.

~~~ wavy line flashback ~~~

A few years ago, I did a C code generator for a 2's complement 16-bit processor that had mostly adopted the convention that 0x8000 was an invalid sentinel value, so that integers would have a symmetric range around zero, ±0x7FFF. This was mostly a software convention (i.e. the ALU didn't trap or anything like that) except that, given that 0x8000 was 'unused', the multiplier only produced a 30 bit result. This made C 'long' multiplication interesting.

Another way to do signed comparisons (I think) is to simply invert the top bit and do an unsigned comparison. For the 3-bit example this is equivalent to adding 4 to both numbers which will shift the range from (-4,3) to (0,7).

So in hardware this amounts to routing one bit of the instruction to be XORed with the sign bits of the operands. You then get signed and unsigned comparison. It's these trivial hardware implementations that made 2's complement the standard way of doing things. using 2 gates to implement a variation on an instruction is awesome.

edit: it would not surprise me if there is an even simpler way to do it.

I didn't downvote, but the article was comparing one's complement to two's complement. One's complement also requires arithmetic modulo 2^n, so nothing you've said is unique to two's complement or helps to explain why it works.

With two's complement, you flip all the bits, and then add 1 to the result. The surprise is that you can shift the negative half of the circle by 1 and everything still works. And not only does it work, but it has advantages over not shifting.

This can indeed be done in any base, but you always have the choice of two complements to use. In decimal you have 9's complement and 10's complement. So, you're right about it working similar in trinary, but the name 2's complement is in fact somewhat specific to binary.


> The surprise is that you can shift the negative half of the circle by 1 and everything still works

But this, again, has has nothing to do with binary but all with modulo arithmetics. If you think in modulo arithmetics, this is still no "surprise".

For example, if you calculate modulo 1000 you don't distinguish between 777 and (777 + 1000x) for any integer x. That is, the following numbers are treated the same:

777, 1777, 2777, ..., 4131321777, ...

but also (777 - 1000 = -223):

-223, -1223, -2223, ...

These all represent the same number (modulo 1000). Modulo arithmetics tells you that for almost all operations it doesn't matter which representative you use (i.e. addition, subtraction, increment, decrement, etc.).

The only difference between signed and unsigend is which representatives you use.

"Unsigned" means: For each class, use the smallest non-negative representative: 0,1,2,...,999

"Signed" means: For each class, use the representative with the smallest absolute value (and use the negative one on tie): -500,-499,...,0,1,...,499

The only binary-specific thing here is how the tie is resolved in the signed case. We prefer -500 over 500 (both are equal modulo 1000), because in binary, that way the first bit always indicates the sign.

But if you are fine with a slightly more complicated sign check, you could als well define the following, where addition, etc. still all work the same way:

"Signed-2": For each class, use the representative with the smallest absolute value (and use the positive one on tie): -499,...,0,1,...,499,500

You're still talking about the modulo circle, and not two's complement relative to 1's complement. Everything you just said having to do with the modulo circle applies to 1's complement. Do you know the difference between 1's complement and 2's complement, and can you explain that difference? Your comments make me suspect you might not understand what 2's complement is. I don't mean that to be insulting, I was only trying to be helpful and explain what my perception of your first comment was, and guess at why people might have downvoted it, since you asked. I realize that suggesting you might be misunderstanding, even if you do understand it, can feel very irritating, and I apologize for that in advance.

Everybody knows about the modulo part, everyone reading this understands that you have a set number of bits, and that positive numbers wrap around to negative ones. That is not what's interesting here, and it is not the key differentiating factor to understanding two's complement, when comparing it to one's complement.

> The only binary-specific thing here is how the tie is resolved in the signed case.

This doesn't feel accurate to me. I'm not exactly sure what you mean by a "tie". But for 1's complement, you have two representations for 0. With 2's complement, you have one representation for 0. The representation for all negative numbers in 2's complement are offset by 1 from the same negative number in 1's complement. Yet both systems allow you to add together two numbers, mixing positive & negative, and get a result that is valid for the system. Why? Why are there two different ways? What happens when I overflow in each system? How does multiplying and dividing work in each system? Those are the interesting questions with complements.

I didn't downvote, but just a minor nitpick, it's modular arithmetic, not modulo arithmetics.

You'd love finite fields.

And groups!

Don't forget about rings.

I always found those ideal.

But the base of all is still Gröbner.

"Division" i.e. multiplying by the multiplicative inverse doesn't require the modulus be prime; it just requires that the number you are finding the inverse of and the modulus are relatively prime.

I program a UNISYS 2200 mainframe at work. It uses 1's complement.

Yes, there are two zeros. Not a problem in practice because all arithmetic operations normalize -0 to +0 at no extra cost in execution time, so -0 practially doesn't happen. IIRC from Assembler class, addition is implemented as subtraction of the negative operand. Just in case anyone ever needs it, e.g. for bitmaps, the SZ (store zero) assembler instruction is complemented by a SNZ.

Programming in a high level language, the representation of negative numbers is all but transparent to the programmer. When reading dumps, not having to perform an extra addition (subtraction?) when changing a number's sign is pretty sweet! Also, abs(-MAXINT) == (+MAXINT), reliably. The asymmetry of number ranges always bothered me on 2's complement machines.

What _is_ annoying about the UNISYS boxes is the 36 bit word format, though. Characters are stored in 9 bit quarterwords that map pretty awkwardly to bytes containing 8-bit ASCII. Binary data formats are essentially incompatible with anything.

> What _is_ annoying about the UNISYS boxes is the 36 bit word format, though. Characters are stored in 9 bit quarterwords that map pretty awkwardly to bytes containing 8-bit ASCII. Binary data formats are essentially incompatible with anything.

This is why the FTP protocol has a byte size command. If all you have is 8-bit bytes then that seems strange. But at the time FTP was designed the most common machines on the ARPANET had 36-bit words (mostly PDP-10s and their derivatives) and bytes (the term was used in the more general sense) were just bit strings of 1-36 bits. 7-bit ascii was common (5 characters would fit in a word, like my username GUMBY), as were six bit bytes (pack six characters into a word). I never used 9-bit characters though arrays of nine-bit bytes were not unreasonable.

BTW the PDP-10 had 18-bit addresses so each word of memory held a Lisp cons; CAR, CDR, RPLACA etc were machine instructions. Gordon Bell and Alan Kotok designed the -10 (and its predecessor the PDP-6) with Lisp in mind. The first Lisp Machines.

> Binary data formats are essentially incompatible with anything.

Well, that's true today, but look at it the other way around: Unix was really developed for an 8/16-bit machine. It was a reimplementation of Multics that ran on a 36-bit machine (GE 645 & Honeywell 6180) written in PL/1. Unix was famously written for the PDP-7 (an 18-bit machine) but it was written in assembly. The famous PDP-11 version was written in a BCPL derivative you might have heard of called "C" and, since PL/1's level of machine abstraction was still new, the derivative modeled the PDP-11 architecture. So nowadays all CPUs are C machines and C runs well on them. Probably the most common non-PDP-11-like machine most programmers will program these days is a GPU.

> 7-bit ascii was common (5 characters would fit in a word, like my username GUMBY), as were six bit bytes (pack six characters into a word)

There were a bunch of different six bit character encodings, often (though not always strictly correctly) called "BCD". The horror show of IBM's EBCDIC was an eight bit extension of one of these.

Then there was 5 bit Baudot code, and...

The last time I checked, many *nix systems will still assume that you're on a 5 bit Baudot (uppercase only) teletype (i.e., a genuine physical tty) if you attempt to log in using all uppercase in your user name.

Some systems hacked in more characters by having special "shift in" and "shift out" characters. If a "shift in" character appeared in the stream, the system would switch to the alternate character set until a "shift out" character was received.

  > … *nix systems will still assume that you're on a 5 bit Baudot (uppercase only) teletype …
Akshully the original 1963 version of ASCII¹, which was a 7 bit code but did not include lower case. The Model 33 teletype² (one of the terminals used by UNIX developers³, and probably a contributing factor to two-character command names) was a 1963-ASCII device. Even after 1967 ASCII added lower case, popular low end video terminals⁴ did not include it so that they could get away with 6 bits worth of printable character ROM.

¹ http://worldpowersystems.com/archives/codes/X3.4-1963/index....

² https://en.wikipedia.org/wiki/Teletype_Model_33

³ https://commons.wikimedia.org/wiki/File:Ken_Thompson_(sittin...


There are also other interesting uses for odd word lengths. For example, many UARTs support word lengths from 5 to 9-ish bits (some do more). This is commonly used to implement out of band signalling for protocols running over these, eg. using 9 bit words, where the ninth bit tells whether this is the start of a command frame. More handily even, in most MCUs this is already correctly separated, ie. there is a byte register with the lower 8 bits and then there is a different register for the remaining bits.

Or using the 9th bit to indicate that the rest of the byte is an address . Some PICs UARTS had an interruption that is triggered when the 9th bit is on.

The example C function in the article uses 8 instead of CHAR_BIT, which today seems not worth complaining about. I never knew there used to be so many machines with non-8-bit bytes.

> I never knew there used to be so many machines with non-8-bit bytes.

It's not like the 8 bit byte was the initial default and others were experiments.

FWIW I believe the 8-bit byte was an IBMism, and for whatever reason I don't remember IBM machines being particularly popular on the ARPANET, which was a research network.

Although its arrival well predated me I do remember a conversation with someone in which we were surprised by how it was becoming common to see people assume that a byte was a fixed 8 bits. I think that was entirely due to the spread of the Vax.

Yes. The early situation was very different; see e.g. http://www.ed-thelen.org/comp-hist/BRL-III-B.html from 1955.

The IBM 360 (1964–) was the killer 8-bit-byte machine.

> CAR, CDR, RPLACA etc were machine instructions. Gordon Bell and Alan Kotok designed the -10 (and its predecessor the PDP-6) with Lisp in mind. The first Lisp Machines.

The Stanford PDP-6 apparently even had a CONS instruction at opcode 257 :)

> When reading dumps, not having to perform an extra addition (subtraction?) when changing a number's sign is pretty sweet!

So I've not worked with 1's complement machines, but how do you add a negative number to a positive number?

For instance, 5 + -3 in one's complement would naively be:

  + 111100
    000001 instead of 2
Doesn't the machine have to perform some fixup for negative numbers in this more common case, instead of a fixup during the arguably less common negation?

I don't have detailed knowledge on how UNISYS microcode works, but I was told that addition was implemented as subtraction of the negated number.

So the "real" implemented instruction was subtraction, while addition wasn't directly implemented. The "fixup", meanwhile, is just flipping all the bits, and this can be achieved, I suppose, more quickly than adding-with-carry over 36 bits.

> I program a UNISYS 2200 mainframe at work.

This sounds awesome. I had no idea UNIVAC is still alive.

I programmed UNISYS 1100's in 1980. Awesome beasts.

The python result should be expected. Python's integer type isn't sized, that is python will happily give you factorial(100), despite it being much larger than 64 bits. It can't then give you twos complement, because it can't know the size with which to complement the two.

That's a reasonable tradeoff for Python, but it's worth noting that there is a way to deal with this.

Quoting from http://reference.wolfram.com/language/tutorial/IntegerAndNum...

    Bitwise operations are used in various combinatorial
    algorithms. They are also commonly used in manipulating
    bitfields in low‐level computer languages. In such
    languages, however, integers normally have a limited
    number of digits, typically a multiple of 8. Bitwise
    operations in the Wolfram Language in effect allow
    integers to have an unlimited number of digits. When an
    integer is negative, it is taken to be represented in
    two's complement form, with an infinite sequence of ones
    on the left. This allows BitNot[n] to be equivalent
    simply to (-1 - n).

Yes, but how would you print that infinite sequence of 1s?

same as the infinite number of zeros in front of a positive number.

Just trimming 1s would be ambiguous. Is b11 3 or -2? You would have to add a 0 prefix to all positive numbers or some other to negative.

Just prefix a + or - just like you do for base 10.

So then it would not make much of a difference anymore with what Python does. Which was what they wanted to avoid.

Indeed. For this use case to_bytes would be correct (writing this in the most verbose way possible :)

    >>> (-352).to_bytes(length=4, byteorder='big', signed=True).hex()
    >>> (352).to_bytes(length=4, byteorder='big', signed=True).hex()
Note how well-defined and independent of the actual machine this conversion is. Since you define everything - length, byte order and whether to get two's complement or not - you'll get the same output everywhere.

Thanks. Good to know this.

Bitwise opetations assume an infinite number of 1s on the left for negative numbers (2s compliment) e.g., ~-1 == 0 (-1 is an infinite number of 1s that are converted to 0 by ~ (invert) operator ).

  3 == 011
  2 == 010
  1 == 001
  0 == 000
  -1 == ..111
  -2 == ..110
  -3 == ..101

Looking at limited-digit odometer style devices gives the best example of why 2's complement is saner.

  ...    or  ...
  9998       1110
  9999       1111
  0000       0000
  0001       0001
  0002       0010
What number is before 0 in binary? 1111. So that's where -1 is. The number before that? 1110, so that's -2. The whole XOR + 1 thing can be derived from this shape.

That, and simple addition of both signed and unsigned numbers actually works. :)

The only question is where you draw the line between underflowing negatives and overflowing positives, and going halfsies on the top bit seems to make sense. For an 8-bit number, there are 128 numbers on each of the negative/non-negative split, but zero mucks it up by being not mathematically positive. 1's complement evens it out by having 127 numbers on each side plus two zeros, but messes up signed math.

the https://en.wikipedia.org/wiki/Method_of_complements article explains a more fundamental piece of information about this operation - i often see articles explaining two's complement, but doesn't say anything about this general 'complement' method (which works for all bases, not just binary).

> In the decimal numbering system, the radix complement is called the ten's complement and the diminished radix complement the nines' complement. In binary, the radix complement is called the two's complement and the diminished radix complement the ones' complement. The naming of complements in other bases is similar. Some people, notably Donald Knuth, recommend using the placement of the apostrophe to distinguish between the radix complement and the diminished radix complement. In this usage, the four's complement refers to the radix complement of a number in base four while fours' complement is the diminished radix complement of a number in base 5.

That makes sense. The names one's complement and two's complement are kind of confusing otherwise, since they actually refer to totally different things, and it's not clear how they would generalize to higher bases.

Thanks for the link, the article mentions usage of complements in machines going back to mechanical calculators and the first example uses 9's complement. Neat!

> There would be two ways to represent 0, as +0 and -0.

IEEE 754 floating point actually has this (due to having a dedicated sign bit). I think most code doesn't care (IIRC they are defined as equal for comparison purposes even though the bits are different in memory), but apparently it's sometimes handy to have for some functions that have a discontinuity at zero or otherwise need to preserve the sign through a multiplication by zero.

By the time you have implemented enough silicon for floating point addition and multiplication, the amount of extra transistors you would need to special case the compare operator's zero case is realtivily tiny.

The same can not be said for an interger alu (especially one without hardware multipliers or even arbitrary bit shifts), where twos complement representation can save a much larger percentage of silicon.

Because of this, javascript has a +0 and a -0. They make a fun trivia question because there are very few ways to distinguish them since most of the ways of checking equality (even ===) will report that they are equal.

In fact, I only know two ways to distinguish them: divide something by them, and you get positive infinity for +0 and negative infinity for -0, or you can use Object.is(-0, 0) which will return false.

Weird. I just tried it out. So in Javascript you can have two variables, `a` and `b`, such that `a === b` and `1/a !== 1/b`.

No Signum function is a bane in many languages.

You sometimes want to be able to distinguish positive underflow from negative underflow. In the same way you sometimes want to distinguish positive overflow (infinity) from negative overflow (minus infinity).

If you're looking for true beauty, look no further than negabinary. (http://mathworld.wolfram.com/Negabinary.html)


     3 = (-2)^2 + (-2)^1 + (-2)^0 = 0110_-2
    -3 = (-2)^3 + (-2)^2 + (-2)^0 = 1101_-2
There is no signed bit, you don't have to worry about sign; everything just works like "normal" numbers because that in fact is what it is.

A pity it was never used except a few times in early computing. I'm never sure why 2's-complement won.

...why 2's-complement won.

Maybee conversion to/from character code is easier. Notice the conversion code in the linked article.

> conversion to/from character code is easier

(1) How often do you convert from machine integers to binary character representations?

(2) If you're referring to

        for(j = 0; j < bitlen; j++) {
            bin[j] = (i & 1) ? '1': '0';
            i >>= 1;    
that's the exact same for a negabinary machine.

I meant to/from numbers in text form. Usually decimal.

One thing that I find very fascinating about Gustavson's Unum work is that he proposes a lot of interesting ideas - not all new, as he is happy to remind you of himself - for encoding numbers:

> Type 2 unums are a direct map of signed integers to the projective real number line. The projective reals map the reals onto a circle, so positive and negative infinity meet at the top.


He also proposes to include the reciprocal of every included number in this projection, leading to a very nice property:

> To negate a unum, you negate the integer associated with the bit string, as if that integer was a standard two's complement number. Flip the bits and add one, ignoring any overflow; that gives you the negative of an integer. It works with no exceptions. But get this: To reciprocate a unum, you ignore the first bit and negate what remains! Geometrically, negating is like revolving the circle about the vertical axis and reciprocating is revolving it about the horizontal axis. And yes, the reciprocal of zero is ±∞ and vice versa.



While two's complement is quite clever, it's not an obvious choice when you are building things out of tubes or relays.

One's complement has two very nice properties:

1) the range is symmetric

2) "end-around carry" makes all the bits look identical in terms of implementation

But isn't that drawback of 1's complement that 0 in an 8 bit number can be represented two different ways?

0000 0000 and 1111 1111

Is it a drawback? If your "compare to 0" is "are all bits the same", then it would be an advantage.

An asymmetric range is a "drawback" of two's complement. Is that a drawback?

It depends upon what your building blocks in the technology are. For example, we don't use J-K flip flops anymore because they are a pain to make and use when MOSFET's are your building blocks (J-K wants bipolars).

A widget I built last year for interactively visualizing a number circle with various binary interpretations: https://thesis.laszlokorte.de/demo/number-circle.html

That's really cool! However, I can drag the SVG's viewbox around; something I wouldn't expect to be able to do.

EDIT: It makes sense when you zoom in (like you would with 7 bits), but if I'm zoomed out all the way, I wouldn't' expect to be able to pan around. I'd expect it to prevent panning past the edge.

Makes sense - thanks for the suggestion :)

It's an elegant construct, but this used to be a part of an entry-level course in every computer school I know of. Have things changed now?

I suspect lots of people here have little or no higher education. Mine was 10+ years ago, so an occasional refresher can be nice; I remember the general concepts, but not necessarily the details.

The category of Things I Once Knew but Have Since Forgotten because I Don't Them Often Enough probably includes 95+% of everything I've ever learned about both mathematics and computers/programming.

The people on here who rattle off the names of various mathematical theorems like it's nothing and act like it's weird not to remember how intro-level algorithms work without thinking really hard and doing some trial-and-error for a while or consulting a reference must have much more interesting jobs than I ever have. :-/

US universities with ABET-accredited programs teach 1's/2's complement to freshman EE/CpE majors in a first course on digital logic, which typical doesn't have any prerequisites.

My thought exactly when I read this kind of articles. Am I that old that they don't teach those simple concepts anymore?

"Why, when I was your age..."

Would it be possible to use - 0 & +0 in useful ways, like determining "direction" of previous operation? No clue how that could be useful, but I'm betting there's a use case out there.

Yes! Although I'm not sure if it would ever be useful for integers, it's vital in floating point: https://people.freebsd.org/~das/kahan86branch.pdf

Probably the most interesting part of 2's complement is how to negate numbers. Flip all the bits and add 1. Which is strange, because to negate the number again, flip all the bits and add 1. It feels like accidentally adding 2 doesn't it?

There is that wierd number 100000...0000 which is that extra negative number with no positive analog. I'm currently working with a noninteger value representation system that hijacks this value and assigns it as +/- infinity (negation is two's complement)

Shitty C code: the binary printing function is needlessly complicated.

    void print_bin(int x){  
	for(unsigned m=~(~0u>>1); m ; m>>=1)  

Shitty C code: I've seen Perl golfs more readable than this.

The article's code is more explicit and verbose, which makes it easier for non-C-programmers (like myself) to actually understand what's going on.

Understandably confusing for a non-C-programmer, but it is idiomatic C. It's not code golf; it's just the way one does machine independent bit twiddling.

A mask is used to test each bit position in turn. The tests look like this if written using binary literals:

    0b1000 & x
    0b0100 & x
    0b0010 & x
    0b0001 & x
The '&' is the bitwise AND opperator in C. Of course, we'd have to do as many of these as the word size so 1010111 uses a for loop that starts with the first mask, a 1 in the leftmost position, and shifts it right one position every time through the loop (using the C right shift operator >> on an unsigned mask value). When the one bit is eventually shifted out the right side of the mask, the mask is all zeros so the loop terminates because zero acts like false in the for loop test.

The only other tricky thing is initializing the mask. To set only the leftmost bit in a word the code uses the bit complement operator ~ of C. Breaking it down for a four bit example looks like:

    0u          == 0b0000
    ~0u         == 0b1111
    ~0u >> 1    == 0b0111
    ~(~0u >> 1) == 0b1000
This is the expression that appears in the for loop initializing the mask value, and it works for any word size.

The original article's code was definitely not idiomatic, efficient, or safe (the memory allocation for an array of characters could fail and segfault). The book Hacker's Delight is a great reference for those wanting to understand how to do low level coding, a requirement for close to the hardware work like writing device drivers.

It uses a bit mask from the leftmost bit to the rightmost and prints the respective bits in x. Is that more complicated than allocating memory on the heap, storing the bits, and then printing them in reverse?

Also the code above does not contain anything C specific, with the exception of 'putchar' let's say.

I think it's =much= easier to read compared to the one in the article. Although the usual approach is bit shifting the input and checking only the lowest/highest bit. Here is an example in Java:

  static void bin(int n) {		
    for (int i=0; i<32;  i++, n<<=1) System.out.print(n<0 ? '1': '0');		

This code (`x & m`) promotes x to unsigned, which means you're not actually printing the bit representation of a signed integer. You'd get the wrong bit pattern for a negative number on ones' complement machines, as signed-to-unsigned conversions are well defined in C and obey modulo semantics. So, for example, `-1` would print as `1 ... 111` instead of `1 ... 110`.

Thanks, i was not aware of that. What is a possible fix?

I was going to say that you just need to change m to int and fix the mask derivation to avoid directly or indirectly manipulating or reading the sign bit. And to do that you _only_ need to know the number of value bits. It turned out more complicated than that.

You can't reliably derive the number of value bits from the unsigned type on evil implementations. Using the range limits like INT_MIN and INT_MAX, though, you can deduce the number of value bits. It's useful that the definition of precision and width in of C11 effectively precludes, I think, a range which doesn't make full use of the available value bits. Also that the standard effectively only permits ones' complement, two's complement, and sign magnitude. Being able to reliably determine the number of value bits means you could carefully shift a masking bit through the set of value bits.

But confirming with the standard I was reminded that shifts of negative values are undefined. But I think we could use arithmetic to shift the bit. Preferably multiplication because I'm not quite grokking the requirements for signed division, and I _just_ learned that INT_MIN / -1 will cause a floating point exception on x86. Cool!

Also, with this general approach we'd never be able to peek at all the representation bits for the value and sign bits. We wouldn't see the high bit on two's and ones' complement implementations to show our hypothetical skeptic how it changes.

We could inspect all the representation bits by inspecting the int object as an unsigned char. But I don't think we could reliably differentiate the padding bits from the value and sign bits, especially on an evil implementation where the value bits weren't all contiguous or which toggled the padding bits semi-randomly just to screw with us.

On the contrary I'd say. When you're talking about low level (and basic CS101 details like 2's complement) I think it makes sense to be lucid rather than "needlessly complicated short" code that does too much in a single line.

> I was not curious to ask what is the need for one's complement or two's complement.

A problem with school, not the student (a common problem IMHO).

Good write up!

The Universe is two's complement. Item 154 of HAKMEM: http://catb.org/jargon/html/H/HAKMEM.html

you could as well provide an explicit mask to help (how far does the sign extend) python f.e a

         'bin(-5 & 0b1111)' gives 
which is what you want

    int bitlen = sizeof(i) * 8;
I wish life was this simple.

This is halfway to p-adic numbers and quote notation.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact