Hacker News new | past | comments | ask | show | jobs | submit login
Signed integers are asymmetrical (borretti.me)
136 points by zetalyrae on Oct 24, 2021 | hide | past | favorite | 148 comments



Unsigned integers are also, trivially, asymmetrical. Floats are symmetrical at the cost of redundant encodings. Fundamentally a binary representation of b bits can support 2^b values. 2^b is even, and any symmetric range that includes zero has an odd number of distinct values, so a binary representation must either be asymmetric, contain undefined values, or contain redundant encodings (e.g. negative zero).


I once worked with a 16-bit processor that used two's complement with the convention — fortunately mostly just a convention — that 0x8000 was an undefined reserved value, rather than −32768. I say ‘fortunately’ because I was tasked with implementing C-style 16- and 32-bit arithmetic, where 0x8000 is perfectly reasonable as an unsigned value and perfectly reasonable as either half of a long. And ‘mostly’ because the hardware multiply instruction took two 16-bit signed operands and produced a 30 bit result.


Do you remember the name of that processor?


Yes, it was the Hamilton Sundstrand 1600.


> Hamilton Sundstrand 1600

Thank you.


The most beautiful way around this I've seen is the Posit number system. It's a floating point system that contains a single NaN which is the opposite of zero. https://posithub.org/


I'm a huge fan of this system, but it's worth pointing out that it doesn't have a single NaN, it has a single infinity with an undefined sign, modeling the projectively extended real numbers. NaN is a bit of an abomination; projective infinity is an uncommon beast, but one that can be worked with consistently. And yes, by pairing zero with infinity on opposite sides of the number ring, you get a symmetric system with no undefined or redundant encodings.


Signed infinity is useful for limits, and also for initial values of some algorithms, so missing them seems like a disadvantage.

Also, it still has no total order, so keeps another messy part of IEEE floats


There's an easy alternative: just use the MAX and MIN values. That's what you would do with integers anyway, in case of initial values. As for limits, if you need to express limits in a datatype for expressing a small subset of rational numbers, you've got the wrong tool for the job.


> Also, it still has no total order

This is not true. It is defined to have the exact same ordering, bitwise, as two's complement integers. That means it has total order: the NaN is the most negative, then comes the negative numbers, then zero, and then positive numbers.


They used to call it "infinity with an undefined sign", then "NaR" for "Not a Real" but in the latest papers I've seen they just call it NaN. But that's just a name, it's the projective infinity as you say. I agree that the IEEE NaN is an abomination.


It still seems to be NaR here https://posithub.org/posit_standard4.12.pdf


NaN is a feature of dynamically typed languages; otherwise they would not be able to represent all outcomes of implicit coercion.


NaN is a feature of IEEE floats to ensure that all arithmetic operations can always create a correct result. For example 0/0 has to evaluate to NaN, no other result would be mathematically correct. If there was no NaN the CPU would have to trigger a fault or return an incorrect result, both are undesirable.


So, today I learned something: the JavaScript critics are wrong at yet another thing.


It crops up more in JavaScript since it doesn't have integers, and if you try to convert non-numeric string to a number you get a NaN


And here I thought NaN was a definition from IEEE-754.


Yes, I'm not sure what parent means. Even java double has NaN.


Looks like there's no negative zero in posit, which is useful in some contexts .... https://en.wikipedia.org/wiki/Signed_zero


Yes, that's a feature. I regard negative zero as a misfeature in most contexts. There might be some, where it's useful, but a general purpose datatype shouldn't be optimized for niche use cases.

There is no a real number called "negative zero". For extremely small-magnitude negative values, posits have more to offer than IEEE floats without the abomination that is known as denormal numbers. And for limits approaching zero, using a datatype that is supposed to represent a subset of rational numbers is the wrong tool for the job.


I've always hated this fact with passion but it's like hating gravity. You can't fight it. Personally, I would have gone with one's compliment and a modified equality but that's just my OCD speaking.


Two things I hate but can't fight:

Tetrahedra almost tessellate but don't quite do it: https://www.pnas.org/content/pnas/103/28/10612/F1.large.jpg

In music, the frequency ratio of a semitone is ideally 2^(1/12), but without some tiny fudging (called tuning), you can't make harmonies as the frequencies almost but don't quite line up right. I forget exactly how this one works so I may have something off.


“12 tone equal temperament” is a good string to Google if you want to learn about that. Most pianos are intentionally out of tune just enough that they’re good enough for every key.

Other coincidences that drive me wild: speed of light is almost but not quite 3.0E8m/s And the fine structure constant being almost but not exactly 1/137.


Pianos are more complicated, because they also use stretched octaves that aren't just a simple 2:1 ratio, due to inharmonicity of the thick bass strings. There are only a few exceptions, where instead of thicker strings the piano was made really long.


>speed of light is almost but not quite 3.0E8m/s

On that subject, if you decided to make your base unit of length the distance light travels in a nanosecond it'd almost be a foot but not quite.


I know that the speed of light is c, or 1 or 3e8 m/s, but I really feel like it's a foot per nanosecond, because that's how I measured it the lab one day messing around with BNC cables amd an oscillscope.

Similarly the earth-sun distance is 8 light-minutes.These feel right, like measuring mass in stone for people, kilos for sugar, carat for diamonds, electron-volts for particles etc.


12 tone equal temperament is all wrong, but never by much, and it is very convenient, so almost everyone uses it.


> In music, the frequency ratio of a semitone is ideally 2^(1/12), but without some tiny fudging (called tuning), you can't make harmonies as the frequencies almost but don't quite line up right

IIRC correctly, it's not just harmonies; the range of a piano is big enough that if you tune each octave exactly based on that ratio, you'll end up with the first and last octaves sounding off from each other.


If you tune with exact 2^(1/12) semitones then all your octaves will be in tune for obvious reasons. (And pianos are normally tuned this way, equal temperament, so I don't know what the grandparent is talking about; for any tuning system some intervals are just and others are not. Equal temperament gives you just octaves, Pythagorean tuning gives you just fifths, meantone temperaments give you just thirds).


> If you tune with exact 2^(1/12) semitones then all your octaves will be in tune for obvious reasons.

Hypothetically, or on an electronic instrument, you could. But if you did all 2^(1/12) ratios, your octaves wouldn't be in tune. Strings on a piano do not behave like an ideal string. Their overtones are not 2X, 3X, 4X, 5X, etc. times the fundamental frequency. Instead, the actual overtones are higher than the ideal frequencies. This is called inharmonicity (https://en.wikipedia.org/wiki/Inharmonicity).

So when tuning a piano, you have to tailor the way you tune it to each different piano if you want that piano's lower strings to be in tune with its higher strings.


> This is called inharmonicity

I think I've been hearing this for a long time but didn't realize it was real so questioned my perceptual system.

Thank you for the info!


There's a pretty big community of musicians who do away with almost all constraints wrt to tuning now.

Some of it, of course, sounds like cats screeching but some of it is genuinely astonishing.


For anybody wanting to hear this, look for "microtonal" music, or the artist Sevish is one that I know of.


Basically at the least you want to create exact frequency ratio of 3/2(major 5th) and 4/3(major 3rd). And also all the power and inverse power which is not achievable in one scale. Instead we substitute 3/2 = 1.5 to 2^(7/12) = 1.4983 and 4/3 = 1.3333 to be 2^(5/12) = 1.3348


It's more the opposite: equal temperment is a compromise that fudges but mostly preserves the important harmonies (thirds and fifths) in all keys.


If the maximum size of your number type is anywhere within 10x of your largest values you should definitely be using a larger number type. It’s just a way to write down a number. It’s like getting mad about the fact that negative numbers when written by hand require an extra character


> It’s like getting mad about the fact that negative numbers when written by hand require an extra character.

Now that you mention it, that convention does make me mad. Not because I have to write an extra character, but due to operations getting mixed-up based on implied context. For example:

Unary + is implicit: 3 = +3

Addition is written as juxtaposition: +2+5 = add(+2, +5) = +7. The first number can have an implicit sign, e.g. 2+5 = +2+5

"Subtraction" is a redundant operation; it's just addition involving a negative number: 8-3 = +8-3 = add(+8, -3) = +5.

The nice thing about this perspective is that subtraction commutes: +8-3 = -3+8.

What's annoying is that we also take juxtaposition to mean multiplication, and this flip-flops depending on implicit characters like unary '+'. For example:

    -8-3 = add(-8, -3)
    -8+3 = add(-8, +3)
    ab = multiply(a, b)
    2a = multiply(2, a)
    a2 = multiply(2, a)  (non-idiomatic)
    -2a = multiply(-2, a)
    a-2 = add(a, -2)
This isn't just a problem when mixing variables with literals, since juxtaposition of literals also means multiplication (as long as they parse to separate numbers), e.g. '13' is a two-digit number, but:

    (1)(3) = multiply(1, 3)
    (-1)(-3) = multiply(-1, -3)
    -1-3 = add(-1, -3)
    -1(-3) = multiply(-1, -3)
    (-1)-3 = add(-1, -3)
Note that we can do the same thing for multiplication and division, if we have a uniary reciprocal operator, e.g. ÷2 = 1/2. That way, division is just multiplication involving an inverse number, which commutes; e.g. 6÷2 = 6×÷2 = multiply(6, ÷2) = multiply(6, 1/2) = multiply(1/2, 6) = ÷2×6.

This seems weirder than the case of addition/subtraction, probably because we already have the horizontal-bar notation for division, which seems even nicer. Note that the "÷" character itself is simply an inline approximation of the horizontal-bar notation (with placeholders above and below); the "foo/bar" notation is a more direct inline approximation (no placeholders, just 'tipping' the bar). Interestingly we don't use "bar\foo" to mean the same thing.


Partial application of the binary grouping operators with the first operand implicit does not have to stop at division: https://news.ycombinator.com/item?id=25278021


I feel the same way - if computers happened to use base 3 instead of base 2 we could have fixed-width signed integers which are symmetric about zero.


It unfortunately makes all the hardware very messy, end around carry is not a pretty thing. You end up needing fiddly flagged prefix adders.


Further down the rabbit hole: https://en.wikipedia.org/wiki/Gray_code#Balanced_Gray_code Have yet to discover/work out an addition algo for balanced similar to Harold Lucal's algorithm https://stackoverflow.com/questions/20923165/gray-codes-addi...


Or just constrain the problem to support only [-2^N-1, 2^N-1].

This gets you an extra bit of precision when multiplying two signed numbers, because you no longer need to leave room for two sign bits in the product.


> redundant encodings (e.g. negative zero).

But negative zero is not a redundant value. It behaves very differently from positive zero.


Does it? The only thing it does I am aware of that it does differently is decide between negative or positive infinity, which doesn't exist for integers.


Negative zero doesn't exist for integers either.

(Note also that -0 is an additive identity, and +0 isn't.)


> Note also that -0 is an additive identity, and +0 isn't.

Wait what

That's really confused me. Is this purely with reference to addition operations as defined in IEEE-754, or is there an underlying mathematical reasoning that makes it make sense?


In the underlying mathematics, the only additive identity is zero, which is neither positive nor negative. IEEE-754 has signed infinitesimals instead, so it can't follow that. You just have to pick one to be the identity, and they picked -0.

I assume the reasoning was that they would prefer values to be +0 when possible. Assigning the sum of -0 and +0 to be +0 instead of -0 means that -0 is the identity and +0 isn't.


Unless you're on a machine which uses one's complement or ign + magnitude, which C currently allows. Luckily, C is moving to make two's complement mandatory.


Floats still have an issue for this particular piece of code though! abs(-0) would return -0


IEEE754 specifies that abs(−0) is 0 (positive). (If your machine doesn't use IEEE754, I'd love to hear more about your project.)


Some coworkers had to deal with MIL-STD-1750A [0] floating point recently. None of us had ever heard of it. They were talking to some pretty old HW.

[0] https://en.wikipedia.org/wiki/MIL-STD-1750A


Wow, lot of famous spacecraft using that.


The commenter above you isn't talking about IEEE754's abs() function, but rather the code from the OP, except applied to floats rather than ints.


That code also has the same problem for one's complement and sign+magnitude, which are allowed encodings in C.


This is going away in the next version of the C standard. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf


Oh, nice!


Floats could have had NaN occupy that encoding slot to balance against zero, but chose to instead add other complication and redundant encodings.


I wonder if negative zero is redundant or actually has some uses.


Yes, it has uses. E.g., see Kahan's "Branch Cuts for Complex Elementary Functions, or Much Ado About Nothing's Sign Bit", copy available at

https://people.freebsd.org/~das/kahan86branch.pdf

(he gives an example regarding complex branch cuts and fluid dynamics applications.)


It is carrying data now, so I'm sure many programs use it.


Another interesting case is dividing INT_MIN by -1. On x86 at least, that will not only overflow, but raise an arithmetic exception too, just like divide by zero. So it's a useful trick for crashing code that tries to harden itself against SIGFPE being raised.


Just tried this on x86-64, doesn't crash.

    #include <limits.h>
    #include <stdio.h>

    int main(int argc, char const *argv[])
    {
     printf("%d", INT_MIN/(-1) == INT_MIN);
     return 0;
    }


Give this a whirl:

    int Div(int a, int b) { return a / b; }
    int (*pDiv)(int, int) = Div;
    int main(int argc, char *argv[]) {
      printf("%d\n", pDiv(INT_MIN, -1));
      return 0;
    }


Yes, for signed integer division of any bit-width this is the only case that results in overflow (division by zero is another exceptional case).


at some point in time you could crash solaris kernel by running a dtrace program that would divide INT_MIN by -1.


I'm still convinced that C made the wrong decision to have abs(int) return signed int. Returning unsigned int would've been more semantically accurate, as well as eliminating a source of UB. At least Rust has unsigned_abs() now.


Analyzing uses of abs(SIGNED) in a large software system several years ago, I found that the most useful replacement was `UNSIGNED absdiff(SIGNED a, SIGNED b)`. The _absolute_ difference of two N-bit signed numbers can always fit in an N-bit unsigned number, but lots of `abs(a-b)` involve intermediate values which don't fit in the SIGNED type and thus invoke undefined behavior. The system was in C++, so it was possible to provide an absdiff template that covered all integral types. However, actually using it tended to introduce cascading type changes because the types of the expressions `abs(a-b)` and `absdiff(a,b)` differed.


The decision is so (apparently) obviously wrong that I'd really like to know what the decision process was.

My first guess is that it involved some kind of architectural peculiarity of the PDP11.


1) Only "Two's Complement" signed arithmetic is asymmetric.

One's Complement arithmetic wasn't uncommon on computers of the era and is fully symmetric. And, I believe that a PDP-7 uses one's complement.

2) "unsigned" didn't get implemented until about 5 years after C was initially implemented.


fabs() returns a signed result so abs() mirrors its behavior. Also if you're taking an absolute value, you're working in close proximity to negative numbers and risk accidentally performing subsequent operations on mixed types that cause an implicit cast from signed to unsigned. C's integer promotion rules allow this to happen.


If abs() returned unsigned then an expression such as -abs(x) would be surprising and wrong.


If abs(int) were being defined now in C, I would advocate that the return value be signed int.

Why? Because it's quite likely the result will be used in arithmetic expressions involving other signed ints and/or subtraction. That goes with the kinds of things you'd use normally abs() for.

There will be some cases where abs() is being used by someone on a value that could be INT_MIN. But when your code might have to handle the full range of possible integers (signed or unsigned), there are probably many other things to get right as well. In such code, abs() is the least of your worries.

I think most uses of abs() are in arithmetic expressions of "numbers whose values we don't expect to overflow", and I'm fairly confident if abs() returned unsigned int, there would be more bugs that nobody spotted in the world than with it returning signed int. Not many more, because abs() is rarely used, but a few.

I used to be one of those people who felt it made sense to use unsigned types in C for values that can never be negative. That was tidier and clearer. It stated my intent.

After a few years of coding in C like this, I changed my mind: I spotted occasional little hidden bugs here and there, undetected by the compiler or the programmer, from overzealously using unsigned types just for "showing intent" where signed would have been fine.

It's not like using unsigned types prevents arithmetic bugs in C in practice (unless you really have values exercising most of the unsigned range). The "use unsigned because types should reflect intended range" argument is muddier than it first looks: The range [0..2^B-1] is no more "correct" than [-2^(B-1)..2^(B-1)-1] for almost all quantites, as people often aren't paying attention to correct handling of numbers in the upper end of the unsigned range either. They rely on "practical numbers are small enough that it doesn't matter", same as with signed ints.

If unsigned acted as a range-constrained arithmetic type in C, meaning "this variable can only hold a subset of the default integer type" and "arithmetic with other types is consistent", that would be different.

But it doesn't. It acts more like an unsignedness virus in C arithmetic, adding unnecessary boundary conditions into innocent-looking expressions.

Of course you can be aware of these issues and avoid them. I'm pretty experienced and can avoid such issues easily. But having to be careful for no added benefit, especially with multiple people, just raises risks. So now I default to, and advocate, sticking with signed integer types for values representing arithmetic quantities. If I were designing a language, I'd probably advocate for ranged types instead. That means limited to ranges like [0..2^(B-1)-1] (the upper half of the signed range), and have consistent arithmetic. But C is not like that.

Unsigned types in C are of course completely appropriate for bitwise and modular arithmetic uses, and for holding raw data. I'm a big fan of using them for relative timestamps using modular arithmetic comparisons, as done in Linux. If you're writing compression routines or a database engine they will be very useful.

But for general arithmetic quantities, nowadays my view is similar to this answer: https://stackoverflow.com/questions/51677855/is-using-an-uns.... I'm sympathetic to the "unnecessary discontinuity near common values" and "arithmetic closure is more useful" view. People won't agree on this. They don't agree in answers to that SO question. All I can say is, I used to be zealously "use unsigned for non-negative quantities", and then after some experience it seemed more pragmatic and safe-by-default to stop doing that, and the type-as-intent was misguided anyway when the upper range of unsigned wasn't being used. You still have to be unsigned-aware in C due to size_t especially. So I still use unsigned-by-default for arithmetic dancing around object sizes, memory and offsets in C.


Totally agree with this sentiment. In javascript, there's only one number type: Number. And it's a 64-bit float. You don't even need integers as a programmer, it's a low-level implementation detail that JIT sometimes employs to speed up your code.

If you absolutely have to, there are escape hatches like Uint8Array/Buffer/etc which allow you to deal with bytes. There are also ways to force JIT to treat your numbers as integers with tricks like `i = i|0`, but that's barely used.

Obviously compiled languages are different, but to me you could rename `signed integer` to `integer` and `unsigned integer` to `rawdata32` or `byte4`


Using the last bit isn't business logic task and we know it from history. We used 32-bit integers to represent file size, did it matter if it was signed or unsigned? No, file sizes outgrew both ranges at the same time. Can we fix the year 2038 problem by using unsigned 32-bit time_t? No, we use 64-bit type. Databases normally don't support unsigned types, but it might feel like if signed 32-bit range is not enough identifiers, unsigned might be? No, we use 64-bit integers. Blueborne used 16-bit unsigned integer, which should be enough for anyone, but a forged packet overflowed it.

>"showing intent"

I think intent of a signed integer is "it's a number, don't think weird things here".


How about removing implicit arithmetic conversions.


That's exactly the problem. Widening conversions between the same signed types are probably fine, but anything else is madness.


I currently follow "use unsigned types", but add `-Wall -Wextra -Wconversion -Wsign-conversion` (and on Clang you also need `-Wtautological-unsigned-zero-compare`) and keep my code at zero warnings, because otherwise C++ (and C)'s implicit conversions are a footgun. After I've added these warnings, I don't think I've hit any bugs due to mixing unsigned and signed quantities (though it's definitely added work to keep track). I did hit "accidentally allowed an unsigned value to go negative/overflow" twice so far, once on MSVC (I don't know a good fix) and once on Clang (that's when I added `-Wtautological-unsigned-zero-compare`, to warn me when I write a check for a value which I think is negative, but is actually UINT_MAX).

One reason I picked this route is because Rust only lets you index arrays by usize and not isize, so I chose to extend this "unsigned for non-negative quantities" philosophy to C++. Additionally, std::vector::size() is size_t and unsigned, so I decided to follow. Because of the mixture of unsigned and signed indexing (and 32 vs. 64 bit values) across different libraries, I decided to turn on warnings so I know where incompatibilities lie. Rust outdoes C++ because it makes incompatible integer widths hard errors, comes with the equivalent of `-Wtautological-unsigned-zero-compare` out of the box, and has runtime checking (in debug mode) which panics if you decrement an unsigned value past 0.

I can understand "signed by default" even though I disagree and prefer not to use it for my own code. And I think 31-bit integers are a good idea if you don't need the range of an unsigned integer. I wish Rust had a type for "31-bit integer with negative values serving as a niche for enum cases" (though my concern is that a zero-overhead mutation API without runtime checks, combined with niche filling, would be UB since you can decrement 0 to -1 and effectively transmute an enum holding a u31).

Also signed integers aren't fully trouble-free, and can still overflow for very large differences (though that's less likely than 2 - 3). The Stack Overflow post mentions:

> Want to find the "delta" between two unsigned indexes into a file? Well you better do the subtraction in the right order, or else you'll get the wrong answer.

Well naive signed integer subtraction can be UB as well.

    constexpr int f() {
        return 0x7fffffff - -0x7fffffff;
    }

    constexpr int x = f();

    <source>: In function 'constexpr int f()':
    <source>:4:23: warning: integer overflow in expression of type 'int' results in '-2' [-Woverflow]
        4 |     return 0x7fffffff - -0x7fffffff;
          |            ~~~~~~~~~~~^~~~~~~~~~~~~
    <source>: At global scope:
    <source>:7:20:   in 'constexpr' expansion of 'f()'
    <source>:7:21: error: overflow in constant expression [-fpermissive]
        7 | constexpr int x = f();
          |                     ^
    Compiler returned: 1
As jepler mentioned (https://news.ycombinator.com/item?id=28983587), if you want to handle arbitrary difference without overflow, you may need to compute the absolute difference and the sign separately. Sadly it's a massive pain to accomplish.

I guess this case (large numbers) is very rare compared to accidentally subtracting 3 from 2, and addition also risks overflow. And if you're using subtraction for deltas, then restricting file sizes to 2^31 - 1 or 2^63 - 1 makes differences fit in int32_t or int64_t (64-bit ptrdiff_t).


> Well naive signed integer subtraction can be UB as well.

Sure, but the point is that you have to get overflow due to high magnitude of the involved integers. So yeah INT_MAX - -INT_MAX is UB, but you have to have quantities up around INT_MAX for this to be a problem. For the unsigned case you just have to have quantities around 0.

Now, having quantities around INT_MAX isn't necessarily that unusual, but what seals the deal for me (I wrote that SO answer) is that 64-bit values are becoming more ubiquitous, and there we can guarantee in some sense that many quantities won't reach their 2^63 limits any time soon. So I definitely thing "signed by default" is much more of a slam dunk if it is paired with "64-bit by default".

Note that though I'm a proponent of signed by default, in practice I find it hard to write this way in C and C++ because you are constantly fighting the impedance mismatch between the language, size_t, size(), etc and your own rule. So signed by default is definitely more palatable in languages which made that same decision for their builtins and standard library.


> So I definitely thing "signed by default" is much more of a slam dunk if it is paired with "64-bit by default".

Yes yes and yes. There is nothing that would make me happier than if GCC and Clang gave us the freedom to choose an ILP64 data model. In that case, we wouldn't need prototypes anymore and we could restore much of the original intent behind the design of C.


FWIW, I think these days unsigned size(), although something that conceptually makes sense, is generally considered a mistake although not something that the committee can easily fix.


In C++, if I use unsigned variables whenever possible, the users get an invariant for free that these indexes are never negative (though C++ has no overflow checking to catch "negative" overflowing values at time of creation). Would a language with signed sizes and indexing lose out on this property, and require data structure and algorithmic/business logic to compulsively check all outside input to ensure these values are non-negative (because negative values are in the type's domain, but invalid for many algorithms)? What about compulsively checking internal state (which may/not have the intended and/or enforced invariant that internal state values are non-negative)?


As discussed elsethread, out of bounds unsigned integers are as wrong as negative indices. And checking for positive and in bound is no more expensive than checking for just within bounds.

The only issue with negative indices is that it is no longer possible to have containers-of-bytes as large as the address space, but except for that tiny time window when an usable 4GB address space was actually available on 32 bits systems, it is in practice not a big restriction.


>Additionally, std::vector::size() is size_t and unsigned, so I decided to follow.

std::string::find is size_t and unsigned too, now guess what it returns in corner cases.


What you have to love here is that the answer is the maximum value of the type, which is... not great but it's perhaps the best of many bad options. However the way it's typically written is -1 even though this is an unsigned type, because C++ thinks assigning negative values to unsigned variables is a completely reasonable thing to do and needn't result in a diagnostic, let alone refuse to compile...


I think that would've only helped a little bit. I imagine most instances of this problem don't occur in abs() (which is just an illustration of the underlying problem), but when people perform simple negation.


When I have built true absolute value operations--which has actually been pretty rare as I don't think you usually want "integers" for most computer work and are better off thinking in terms of finite fields instead of numbers whenever possible--I add a check that the result is actually non-negative (which is the kind of sanity check one would also want to write to verify you don't lose precision, saturate, or overflow in other "I am using these as numbers instead of a finite field" cases); or, if I am working with a number I know to be negative, sometimes I go with "the additive inverse wasn't the number itself", as in this code:

https://github.com/OrchidTechnologies/orchid/blob/be5cc32a16...


> I don't think you usually want "integers" for most computer work and are better off thinking in terms of finite fields instead of numbers whenever possible

I think it is the other way around, and that's why high level languages (under some definition of HLL) tend to have arbitrary precision integers. Python, Ruby, Lisp, and now Javascript (bigint) all have them. Finite fields are almost never used except in some special applications like cryptography or error correcting codes. Multiplication in the field with 2**32 elements doesn't look anything like integer multiplication. Normal computer arithmetic is not finite field arithmetic.

We really should be using trap-on-overflow in C most of the time. Making that difficult on RISC-V seems to me like a poor design decision.


Yeah... I absolutely appreciate my belief is controversial "at best" ;P, but I stand by it, and even (personally) consider this one of the defining characteristics of someone who truly "understands" and thinks fluently and natively in the mental model of practical computation vs. someone who would prefer to translate from their real-world experience into the inherently-finite world of machines.

That said, I certainly agree that if you are working with an actual unbounded numerical value you should be using a data structure that at least attempts to simulate an unbounded numerical value; but, what people always insist on instead is to continue using their fixed integer types which they want to magically have compatible semantics with unbounded integers... a fiction they insist the compilers and even CPUs join them in crafting, with things like the "trap-on-overflow" feature you are advocating for :/.

The result of this is that, rather than accepting types like "uint64" are finite fields that should have perfectly regular overflow, and inverse semantics that under no circumstances, should be "messed with" by the environment, and that they should be using "BigInt" for any actual numbers that occasionally come up, people keep ruining the finite field types by trying and, in the end, failing to add a bunch of weird optimizations and overflow trap semantics to their operation... mitigations that only cause further errors later: these types should have the most brutal possible interpretation of finite field math, and anyone who wants an actual number--maybe to represent a physical or monetary quantity should be forced to not consider them an option.

Meanwhile, languages should do anything and everything possible to not pessimize the syntax of types like BigInt by providing a short name for a good implementation of the type along with operator overloading that prevents anyone from ever considering the usage of a type like C's "int" to represent a number merely because it is easier or looks better or feels natural: you fix this issue, and "trap on overflow"--a ridiculous semantic that makes software look like it works until you push it a bit, and in a way that fails to be fungible with (and accepting the same solutions as) running out of memory--becomes a feature that has no legitimate usage.


uint64 is not a finite field type. For example, the number 2 has no multiplicative inverse mod 2**64. Ok I guess I understand what you meant, it's just that the mathematical concept of a finite field means something completely different than that. The integers mod 2**64 are a ring even though not a field, if that helps. But I'll stop quibbling over terminology.

We do in fact normally use integer datatypes to represent integers rather than elements of a cyclic group. For example, we expect n+1 to always be greater than n unless we are doing something not-so-common. In (say) Python, the int type is bigint, sometimes informally called "infinite precision" though their size is actually bounded by the size of the computer's memory. If your integers get too big, your program runs out of memory and weird things can happen, such as crashing, the OOM killer clobbering the process, or whatever. Basically undefined behaviour. In C it is the same thing, an int64 is infinite precision except bounded by the size of the machine word instead of by the memory size. If your program experiences uint64 overflow it's likely to be a genuine error condition unless the program is doing bit twiddling rather than arithmetic. In the case where it is an error condition, of course I'd want to trap it.

Haskell makes the distinction between the Int type (an integer expected to fit in the size of a machine word) and the Word type (a bit pattern that supports some arithmetic operations). That is really what C should have done from the beginning. I've been wanting to check how Rust handles it.


> The integers mod 2*64 are a ring even though not a field, if that helps.

Oh, ugh: I actually usually say "modular ring", but I'm not even sure that's a technical term, so for this comment I tried to get all fancy (as I thought I remembered that all rings are fields but not all fields are rings) and failed :(.

> We do in fact normally use integer datatypes to represent integers rather than elements of a cyclic group.

I assume our core disagreement comes because I believe that fundamentally everything representable on a Von Neumann computer is cyclic due to how memory works. I continue to 100% appreciate that this is an incredibly controversial opinion and that very few people are going to agree with me on this.

> If your program experiences uint64 overflow it's likely to be a genuine error condition unless the program is doing bit twiddling rather than arithmetic. In the case where it is an error condition, of course I'd want to trap it.

However, here I want to claim you are just wrong: you simply shouldn't be using that type for this... you should be using a big integer type; and the core solution here should NOT be to make overflow of this type trap, but to figure out why you are using this type in the first place and fix the language until you stop.

> In (say) Python, the int type is bigint, sometimes informally called "infinite precision" though their size is actually bounded by the size of the computer's memory. If your integers get too big, your program runs out of memory and weird things can happen, such as crashing, the OOM killer clobbering the process, or whatever.

This is fundamentally different, though: this isn't your program being wrong, but the system executing your program being wrong. Your program should strive to be correct, and insisting that "no one is ever going to need more than X-bits worth of Y" as an assertion in your program--as opposed to it merely being a limitation of the machine running your program--is a mistake.


Rust offers:

* i8::MIN.checked_abs() is None

* i8::MIN.overflowing_abs() is (i8::MIN,true)

* i8::MIN.saturating_abs() is i8::MAX

* i8::MIN.wrapping_abs() is i8::MIN

[edited to add]

* i8::MIN.unsigned_abs() is the 8-bit unsigned value 128

and finally

* i8::MIN.abs() will panic in debug builds or give i8::MIN in non-debug builds.

You should obviously not choose this last option if you in fact will call this function on i8::MIN (or any similar minimum values) which is why it panics if that happens. You should have instead chosen which behaviour you wanted (checked, overflowing, saturating or wrapping) up front.


> I stand by it, and even (personally) consider this one of the defining characteristics of someone who truly "understands" and thinks fluently and natively in the mental model of practical computation vs. someone who would prefer to translate from their real-world experience into the inherently-finite world of machines.

For my own personal vanity, is someone who hasn't studied high enough level math enough to have a formal sense of what "finite field" means necessarily non-fluent in practical computation, or is it possible to have sufficient intuition about computational versus "real-world" integers without being able to express it?


Programmers may not know what a finite field means, but they must know for any operation that they are using whether the operation is invertible or not, i.e. whether from the result you can recover the input value or not.

If they are not aware of this property, bugs will certainly be caused by this, sooner or later.

In a set of modular numbers where the modulus is not a prime number, e.g. in C/C++ "unsigned char", "unsigned short" "unsigned", "unsigned long" and "unsigned long long", where the modulus is a power of two, multiplication is invertible only if the multiplicator is relatively prime with the modulus.

In the C/C++ modular numbers that means that only multiplication with odd numbers is invertible.

A consequence of this fact, which should be familiar to most programmers, even if they might not be aware of the cause, is that we have only one kind of shift to the left (i.e. multiplication by 2), but 2 kinds of shift to the right (division by 2).

In the set of integers modulo 2^n there are 2 numbers that multiplied by 2, i.e. shifted 1 position to the left, give the same number, and they correspond to the so-called logical shift to the right a.k.a. unsigned shift to the right and to the so-called arithmetical shift to the right a.k.a. signed shift to the right.

The names of the 2 shifts to the right are misleading, because both are well-defined meaningful operations for what in C/C++ are named "unsigned" numbers, but which are defined in the standards as modular integers, not unsigned integers (which would give exceptions or saturation on overflow).

The fact that in C/C++ you get for ">>" one of the 2 shifts depending on whether the operand is "signed" or "unsigned" is just a convention. However this convention is useful in most cases, because in C/C++ smaller numbers are typically considered as being truncated from larger numbers that correspond to either their zero-extended or their sign-extended equivalent, and not as really being modular numbers as they are defined and as they behave.

In languages that impose more constraints on the programmers, e.g. Ada, the programmer need not be so aware about how numbers are represented and which is the meaning and properties of the operations applied to numbers.

On the other hand, in C/C++, which besides some implicit safe conversions also have a large number of unsafe unchecked implicit or explicit number conversions, programmers will very likely cause some bugs eventually, unless they understand well the differences between modular numbers, signed integer numbers, unsigned integer numbers (really unsigned numbers, not those named so in C), how these numbers are compared by the hardware, depending on whether they are signed or not, how and when overflow is signaled by the hardware, also depending on whether the numbers are signed or not, which operations are invertible and which not, and so on.

Sometimes, especially when programming for embedded computers, the freedom of C/C++ is convenient, but nonetheless a lot of care is needed to not be surprised by undesired implicit conversions and truncations, like in the example that started this thread.


> finite fields instead of numbers whenever possible

A nitpick: 32 or 64 bit integers are not finite fields. One of the finite field axioms is that any non-zero number must have a multiplicative inverse, which 2 does not have.

They are, however, commutative rings.


I wouldn't call that a nitpick; I'd call referring to these a serious misuse of terminology! Finite fields are altogether quite different.

There are finite fields of of order 2^k for each k, but they work quite differently from the integers modulo 2^k.

I think the best short term for describing how machine integers work is to just say "modular arithmetic".


> I'd call referring to these a serious misuse of terminology! Finite fields are altogether quite different.

Sure, but also the important part of OP's argument seems to be that numbers be treated as members of the appropriate algebraic structure. Simply subbing in the correct terminology should fix the issue.

> I think the best short term for describing how machine integers work is to just say "modular arithmetic".

For addition this is correct, I don't believe it's correct for multiplication because of the negative integers. That is, the ring described by signed n-bit signed integers under multiplication and addition is not isomorphic to (Z_{2^n}, +, *).


> For addition this is correct, I don't believe it's correct for multiplication because of the negative integers. That is, the ring described by signed n-bit signed integers under multiplication and addition is not isomorphic to (Z_{2^n}, +, *).

No, it is. It is in fact just Z modulo 2^n, just with the choice of representatives being { -2^(n-1), ..., 2^(n-1)-1 } instead of { 0, ..., 2^n-1 }. This is pretty easy to check... do you have an example to the contrary?


So, the reason I wanted to talk about this mathematical type--and then used the wrong term :(--is because of how +,* forms a ring... and so does ^,&. It isn't sufficient to say "modular arithmetic".


I see, interesting. Yeah at that point you have more structure there. But yeah that's just like... two overlapping ring structures? Hm. <shrug>


I'm not even sure if we can call it modular. For example the % operator is defined as remainder rather than modulus if you're dealing with negative numbers. That's why -7 % 3 == -1 in C and -7 % 3 == 2 in Python.


I'm not sure how this is relevant? For addition, multiplication, and subtraction, it is indeed modular arithmetic. For anything involving division it's altogether different, but, I took that as being outside what's being discussed, as nobody would mistake that for a finite field.


Your the one who chose to pull out the math terminology. Euclidean remainder isn't modulus. That's not even undefined behavior. For example, I could prove abs() isn't an abs() function by pointing out the abs(INT_MIN)<0 contradiction under two's complement arithmetic, if not for the fact that ANSI X3.159-1988 § 4.10.6.1 conveniently and explicitly un-defines that away. All it takes is one example of defined behavior that contradicts your mental model, to prove your mental model is wrong. So it's safer to assume that C is just C.


OK, sorry, but I'm not following what you're saying here. (Also, this comment seems needlessly hostile?) You seem to be making a bunch of assumptions that you haven't made explicit. I suspect we're simply talking about different things and therefore not actually disagreeing.

So, to try to be absolutely explicit about what I was saying:

I was not saying anything about the C programming language. I was talking about the more general notion of fixed-length integers; more specifically, ones that are either unsigned or two's-complement. Obviously, that is frequently applicable to the C programming language, but they are not identical, as, e.g., technically C doesn't require two's-complement for signed integers (and what I said would be false for other sorts of signed integers such as one's-complement). So I'm not talking about C, and the C standard is not relevant to what I was saying.

I am not saying anything about the % operator, in C or any other programming language. I am talking about modular arithmetic, where you fix a modulus m -- which in this case is 2^k, where k is the length of the integers in question -- and you form a ring by taking congruence classes modulo m, and you add and multiply congruence classes by picking representatives, applying the appropriate operation, and then taking the congruence class of the result. The particular choice of representatives is irrelevant.

As I already said, I am only talking about addition, negation/subtraction, and multiplication (and the constants 0 and 1, if you want to be really nitpicky). Other operations are not relevant to what I'm talking about.

Now obviously, to represent congruence classes in a computer, you need to pick representatives; and so the modulo operator % (which, again, is not relevant to my point, but I think I need to address this anyway) returns a number rather than some sort of notional congruence class object, because, well, what else is it going to do. So, it returns a representative of the congruence class rather than the congruence class itself, because there's no way for it to do the latter.

As you point out, in C (as of C99 anyway), as well as several other programming languages, the % operator, when evaluating n % m, returns a representative from { -|m|+1, ..., 0 } if n is negative, instead of { 0, ..., |m|-1 } like might be expected. This is a quirk of the % operator, for sure, but it doesn't have anything to do with what I'm talking about. Also note that the ways that programming languages handle negative n or negative m vary quite a bit; some do what C does, some use the sign of m rather than n, and some always return a nonnegative result.

The distinction you are drawing between "modulus" and "Euclidean remainder" is, at least in mathematics, not a standard one; as such, if you want to make use of it, you are going to have to explain what you are talking about, rather than just assuming I know it. (I mean, to be clear, there is a distinction between the two, but it's not the one you seem to be drawing. Rather it's that Euclidean remainder is a number rather than a congruence class; is only defined in Euclidean domains; and isn't actually required to be unique. But that's tangential, we're just talking about integers, I'm going to assume you mean the obvious thing by "Euclidean remainder".) In math, at least, the result of modding out is a usually a congruence class, rather than a specific representative, unless you specify otherwise or the context makes it's clear that that really is what you mean (e.g. using "mod" as an operator). When "mod" is used as an operator, and n is negative, I'd assume that the author intended { 0, ..., |m|-1 } as their set of representatives, rather than mimicking C's % operator, unless they specified otherwise.

As best I can tell, you seem to be using "modulus", as opposed to "Euclidean remainder", to indicate the specific behavior of C's % operator. I would suggest that this terminology is not as standard as you seem to think it is and will confuse people; this is both not how it is used in mathematics, and not how it is used in lots of other programming languages. If you are trying to refer to that particular behavior, you should use a more unambiguous description.

But -- I say this again -- none of this is really on the point I was making. The point I was making was this: Addition, subtraction, and multiplication on fixed-width unsigned integers of length k, with all overflow ignored, consists exactly of operating on the numbers in Z/(2^k), and then taking the representative that can be expressed in k bits unsigned. Moreover, the same is true for fixed-with two's-complement integers of length k; you operate in Z/(2^k), and then take the representative that can be expressed in k bits 2's-complement. That is to say, you're just doing modular arithmetic. The numbers add, subtract, and multiply exactly like their corresponding congruence classes in Z/(2^k). That's all I'm claiming. Again, none of this is about the particular behavior of any C operators.

Now that I have utterly belabored this point, is there anything, on that main point, that you disagree with?


Yeah, I'm never going to live this down ;P. But, I seriously drafted this initially with "modular rings" and then was like "ugh, that's totally just a term I made up, right?" and changed it to "finite fields" because the last time I was staring at this for long periods of time I was programming a ZKsnark, and I had a vague memory that "all rings are fields but all fields are not rings" :(.


(And I just remembered the actual thing that "all rings are but not all are rings": groups :/.)


Bear in mind the absolute value is a red herring, the problems is negating a signed number, which is extremely common, and hardly anyone guards against that.


Well, nor do most people guard against absolute value failing, even inside their implementation: most people are simply wrong... that said, I also think matter-of-factly "negating" a signed number--rather than asking for the additive inverse of a field element (in which case -128 is the correct answer for -128 as when you later go to add it to another number it will have the correct behavior even if it doesn't look like it makes sense if you pretend it is a number)--is also usually a mistake ;P.


The reason why this is not signaled as an error in C/C++ with -ftrapv is because of the implicit integer truncation on assignment/initialization, which does not give errors in C/C++.

In C/C++, when -x is computed, x is promoted to int, so "-" does not overflow and "-ftrapv" does not matter.

This would be harmless, except that "return -x" truncates the result to 8 bits, which overflows, but this truncation is not checked so it does not signal an overflow exception, as it should in a decent programming language.

Had x been an "int", instead of an "int8_t", the error would have been caught in C/C++ with -ftrapv (when "-" is applied to INT_MIN), but "int16_t" would have also resulted in an error that is not caught, due to implicit unchecked truncation.

While forbidding all implicit conversions is a serious mistake in the opposite direction, many of the implicit conversions that exist in C/C++ for historical reasons are extremely harmful and they should be avoided like the plague, i.e. the truncation of larger integers into smaller integers and also the implicit conversions of signed integers into unsigned integers.


Note that this isn't just a gotcha when writing your own absolute value function. The code given in the example is also how java.Math.abs() works.

In C++ std::abs(std::numeric_limits<int>::min()) is undefined behavior (not even implementation-defined: nasal flying simian warning!). Most importantly, the optimizer is allowed to make optimizations that are only safe if abs() only returns non-negative values, but abs(INT_MIN) can also return a negative value (or trap to the kernel/SIGFPE/SIGBUS/etc. on some exotic architecture).

So, in C++ one might be tempted to get around this issue by writing:

  static inline int my_abs(int x) {
    const int result = abs(x);
    if (result < 0 /* WRONG! UB! */ ) {return numeric_limits<int>::max();}
    return result;
  }
But the C/C++ optimizer is allowed to optimize away your conditional branch. You need to check for INT_MIN before calling abs() if you want to guarantee the optimizer won't thwart you.

More crazily, if the C/C++ optimizer discovers that some conditional branch is always conditioned on undefined behavior, it can just optimize away the branch, or even both sides of an if-else.

  static const int x = std::numeric_limits<int>::min();

  void f() {
    puts("Before\n");
    if (abs(x) >= 0) {
      puts("Expected\n");
    } else {
      puts("Unexpected\n");
    }
    puts("After\n");
  }
The optimizer is allowed to determine that neither the if nor the else ever executes and optimize f() to { puts("Before\n"); puts("After\n");}

When I was at Google, Michael Chastain was telling me about one of the oddest bugs he helped debug (super helpful and knowledgeable guy, btw, one of the few people trusted with code ownership of the root of the monorepo). It was basically the above, but the UB was related to static initialization order of floating point variables.

  static double y = 22.0;
  static double x = y / 7.0;  // Uh-oh... depends on floating point static init order

  void f() {
    if (x > M_PI) {
      ...
    } else {
      ...
    }
  }
When my plan for world domination succeeds, one of my first acts as World Emperor will be to appoint a commission to change a lot of this undefined behavior into implementation-defined behavior.


Decided to try in rust to test the behavior [0]:

    fn absolute(n: i8) -> i8 {
        if n >= 0 { n } else { -n }
    }

    fn main() {
        let nums = [i8::MIN, i8::MIN + 1, 0, i8::MAX - 1, i8::MAX];
        for num in nums {
            println!("{}", absolute(num));
        }
    }

Seems to cause a runtime error:

       Compiling playground v0.0.1 (/playground)
        Finished dev [unoptimized + debuginfo] target(s) in 3.57s
         Running `target/debug/playground`
    thread 'main' panicked at 'attempt to negate with overflow', src/main.rs:2:27
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

[0] https://play.rust-lang.org/?version=stable&mode=debug&editio...


That's correct. Rust provides four versions of fundamental arithmetic for the integer types:

* Checked where you get Some(answer) or None

* Overflowing where you get (answer,false) or (overflowed_answer,true)

* Saturating where you get answer or the closest of type:::MIN and type::MAX to the correct answer

* Wrapping where the number line wraps around at the end

If you don't pick which you wanted, but then you do something that's poorly defined a debug build will panic, whereas a production build wraps.

The panic in debug is to encourage you to actually decide which of the four operations you meant and write that.


There is no absolute value for -128 (or 0xff) in the range of int8_t because 128 is not in 0..127. The error is not the logic or the compiler but rather the return type which should be (at least) int16_t. Indeed simply changing the return to int16_t gives

  abs(-128) =  128
  abs(-127) =  127
  abs(   0) =    0
  abs( 126) =  126
  abs( 127) =  127


It just moves the problem. You can keep pushing the problem up to int64_t, but it does not go away.

  abs(-9223372036854775808) = -9223372036854775808
  abs(-9223372036854775807) =  9223372036854775807
  abs(                   0) =  0
  abs( 9223372036854775806) =  9223372036854775806
  abs( 9223372036854775807) =  9223372036854775807


Does the type flow back from the return type to the negation operator? I had no idea C worked like that! I would have assumed -x was the same type as x and then it was converted for the wider return type.


Arithmetic operations are never done in a type narrower than int. int8_t is necessarily narrower than int (int must be able to represent at least the range -32767..32767), so in -x the x is promoted to int, then negated.


I believe x gets promoted to int, negated and then narrowed to the return type.


That is what the LLVM IR says happens:

https://godbolt.org/z/4aexEK3aE


Discovered that abs can return negative value in Java running Findbugs some year ago, it blew my mind how booby trapped language can be.


You should instead say "it blew my mind how booby trapped CPUs can be."


This is why I like Ada, even though I haven't used it in 20 years.


I was surprised to discover that -ftrapv did nothing. So the obvious next step was to test on Ada, which I expected would catch this error on the first try.


Seems like there's a weird error in the article that is obvious and kind of confuses me a bit.

In the first example, the branch is `>= 0` but then in the "examples" second, the second part, it's `> 0`. Which is it?

Either way, neat point to be made here. I'm surprised -ftrapv doesn't catch it.



This reminds me of an issue I had when pushing 64-bit integers from a C# stack down to JS. Lots of weird issues, which I finally tracked down to the fact that JS doesn't have full 64 bit precision for numbers. IIRC it's something like 56 bits.


53 bits.

If this seems strange to anyone, it comes from the fact that all numbers in JS [1] are represented as double-precision floating point values. The bit format for doubles is a sign bit, 11 bits for the exponent, and 52 bits for the fraction.

When storing integers in floats, you use an exponent of 1, and then stuff the integer into the fraction and sign bits.

[1] I think bigint is a thing now?


> When storing integers in floats, you use an exponent of 1, and then stuff the integer into the fraction and sign bits.

Is this correct? I would have just guessed that JS stores integers as their equivalent float values, which is why above 53 bits, you stop being able to represent all whole numbers (but you can represent some whole numbers), unless JS-land has changed since I last checked in almost a decade ago.


This is how you get the equivalent float value from an integer, assuming it fits within that 53 bit range. You're correct that you could pick a larger exponent at the expense of being able to store consecutive integers.

Edit: Ignore this, I'm wrong.


no, it is not. There is a "hidden" "one" bit, the fraction part "XXX..." of the IEEE 754 float represents 1.XXXX... to be multiplied by the (offset) exponent. The exponent for the integer 2 in f64 is different from the exponent section for the integer 1, this is easy to verify:

    const std=@import("std");

    test "IEEE fp representation" {
      std.debug.print("\n{x}\n", .{@bitCast(u64, @as(f64, 1.0))});
      std.debug.print("\n{x}\n", .{@bitCast(u64, @as(f64, 2.0))});
    }
    
gives

    3ff0000000000000
    4000000000000000
note that the fraction parts for both 1.0 and 2.0 are entirely zero, as they are powers of two times [1].00000000...

The "hidden bit", while tricky and hard to understand, is probably one of the few truly good decisions that IEEE-754 comittee made out of a lot of kludgey compromises. Note that if they had done it the way you suggest, there would be degenerate representations for the same number, leading to a much smaller total possible representations for the same bitspace (this is a minor problem for most object-based "Decimal" number systems used for financial Txs -- you often need to use a specialized equality function instead of the language builtin), the "hidden one" ensures that every binary form has a uniquely represented value, not counting craziness with NaNs, and plus/minus zero, sigh.


Doh. You're right. And that's even why they call it the fraction, because it's the fractional component of that 1.XXX number between [1.0, 2.0).

My brain fart was forgetting how floats work, seeing that the integer-resolution range for both single and double precision floats was exactly the number of bits in the fraction + the sign bit and then pattern matching.


I think it would be interesting to explore a type where you nan-box integers, which is what I thought you might be talking about, and it would be fascinating if JS did that. Don't know if it's worth the space savings over a simple union type, tbh.


This. But you can get pretty close to "stuffing in the bits". What we used to do to get a random float:

Set the exponent to 0 and stuff in bits for the significand. This gets you a random float in the range [1,2). Now subtract 1.0.

Note that this misses many floating point values that should be selectable if you wanted all floats between 1 and 2, but it was good enough for video games.

It's worth pointing out that IEEE 754-2008 has decimal numbers like you describe without the hidden bit. But for decimals this can be really useful because numbers "remember" their exponent. So if you're doing addition of dollar values, for example, .84 + .12 = 1.00 rather than 1 (i.e. it remembers its significant digits).


Remembering significant digits is not really that important in fintech applications, as you're usually in a fixed-point system that is pinned to the currency (dollar transactions usually down to the cent, handful of arabics down to the millimes, yen cut off at single unit).


> [1] I think bigint is a thing now?

Yeah, pretty decently supported as well: https://caniuse.com/bigint

I ran into it with WASM u64/i64 values, which are also apparently decently supported at this point: https://webassembly.org/roadmap/


JavaScript stores all numbers as 64-bit floating points, so precision is limited to 53 bits (52 stored plus an implied leading 1).

There are, of course, edge cases. For example, despite the fact that numbers are floats, you can still perform bit operations on them. For these operations, the numbers are implicitly converted to 32-bit ints.

JavaScript also has Typed Arrays, so you can have an array of 32-bit unsigned ints. But not a u32 itself.


Nothing to see here, move along.

2's complement is negative biassed. 1's complement (unmentioned in the article) is zero biassed.

Neither surprise me.

Seriously, who cares? Very few of us. I'm one but this isn't a problem for almost everyone else.


The remarkable thing is how many people seem to be working with computers without understanding how they work.


This issue could be even harder to detect with a generic Abs function. It would work on floats (I think?) and on almost all ints, but it would have edge case failures on certain ints.


In C++ you would so something like

if constexpr( std::is_unsigned_v<Number> ) { return x; } else { if( x >= 0 ) { return x; } if( x <= -std::numeric_limits<Number>::max( ) ) { // What do we return?? } return -x; }

This comes up in parsing too. One often will harden against signed overflow by using an unsigned type and then multiplying the sign at the end. One needs to check for these kinds of things when the type is the biggest, like intmax_t(generally int64_t)


You can also (except this is awful!) work in the space of negative numbers through almost all of parsing and then, unless the sign is "-", negate the value just before returning it. However, having a base unsigned parsing routine and then doing signed conversion including application of unary +/- sign to the number at the very end allows sharing the most code.


The issue is that overflow on signed is UB. I can count final decimal places(ptr - start_ptr) to check if it is larger than the maximum digits allowed. This is cheap as the check is after the parse. Throw in a floor(log10( result )) == diff or something like that too.


ARM NEON did it right. They have two sets of integer negate and absolute instruction, normal and saturating. For signed bytes, the saturating versions transform -128 into +127.


My favourite integer run-time error is -128/-1 (eg for int8_t) because even code that is careful to avoid division by zero will often miss this case.


By the way, in the glibc manual for abs, there is a note:

    Trying to take the absolute value of the most negative integer is not defined.


I find it interesting that so many responses are trying to dismiss this. I chatted with several programmer friends of mine and they all answered the question wrong (what is the abs(-128) in int8) and then got defensive and tried to play it off.

Be honest, it is not an obvious gotcha. Don't get defensive if you didn't realize the implications of abs(-128) = -128 @int8.


Ternary computers wouldn't have this issue


Signed integers are symmetrical on architectures like the Sperry Univac 1100/2200 and derivatives (still available from Unisys, I believe) which use ones complement, and therefore have both positive and negative zero.


Blame the concept of zero for making things odd.

Edit: what idiot downvotes this?


Underrated comment. Since zero is even, is every number system with zero asymmetric?

Also, the commonly used Common Era system for dates does not include a year zero, so it is symmetric.

A year zero does not exist in the Anno Domini (AD) calendar year system commonly used to number years in the Gregorian calendar (nor in its predecessor, the Julian calendar); in this system, the year 1 BC is followed directly by year AD 1. However, there is a year zero in both the astronomical year numbering system (where it coincides with the Julian year 1 BC), and the ISO 8601:2004 system, the interchange standard for all calendar numbering systems, (where year zero coincides with the Gregorian year 1 BC; see conversion table). And there is a year zero in most Buddhist and Hindu calendars.

https://en.wikipedia.org/wiki/Year_zero




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: