Two's complement integers with only sign bit set should be a trap representation

eru · on Feb 7, 2024

From the linked discussion:

> @jrose @fclc People who want NaN in their integers are welcome to do that in the privacy of their own bedrooms with new types they invent for this purpose.

That reminds me of OCaml's 63 bit integers. In the privacy of OCaml's own bedroom, they use one bit of their integers as a flag to distinguish pointers from integers.

See https://blog.janestreet.com/what-is-gained-and-lost-with-63-...

The results are quite.. interesting. However, it does go to show that having slightly weird private integer types is actually totally workable, so the quote from the discussion might be meant as trolling, but it's less crazy than it sounds like.

rob74 · on Feb 7, 2024

The difference is that (a) that bit in OCaml has a purpose, while disregarding 0x80..00 hasn't; and (b) I assume that it was in OCaml from the beginning, so it wouldn't break existing software in probably very interesting ways (hence the suggestion that this should be done only "in the privacy of their own bedrooms with new types", i.e. not for already-used types)...

JKCalhoun · on Feb 7, 2024

Yes, correct me if I'm wrong, but I thought the point was that 0x8000 is like negative zero and so of no utility, redundant. So let's make it useful.

gridspy · on Feb 7, 2024

A couple of things that are special about 2s compliment

- There are no redundant numbers like -0

- Adding two integers together in this format works with negative and positive numbers with no special case.

-1 is actually 0xFF or 0xFFFF etc. So 0xFF + 0x01 (-1 + 1) "overflows" to 0.

mbel · on Feb 7, 2024

It's not. 0x8000 is the smallest (-32,768) integer that can be represented with 16-bits and two's complement.

pornel · on Feb 7, 2024

abs() of this value also happens to exceed the maximum positive value. Which means that in C, a simple negation can cause signed integer overflow -(-32768), and therefore UB.

taeric · on Feb 7, 2024

Yeah, I think it is funny that in a standard binary, it would replace negative zero and makes some sense. In two's complement, it replaces the minimum value.

Fair that sentinel value is sentinel value, I suppose. But it is not like it is not a known and unique number in that encoding.

hoseja · on Feb 7, 2024

> > @jrose @fclc People who want NaN in their integers are welcome to do that in the privacy of their own bedrooms with new types they invent for this purpose.

And yet I bet they use IEEE754 floats without second thought.

wruza · on Feb 7, 2024

I agree with the quote and use IEEE754 floats with second thought every time. In my perfect world floats don’t have inf, nan and base-2 exponent. And integers are 0-sized exponent floats.

temac · on Feb 7, 2024

> In my perfect world floats don’t have inf, nan and base-2 exponent

I'm quite unsure what do you want as floats then?

marcosdumay · on Feb 7, 2024

It's quite clearly decimal floats, with instant trapping instead of bad values.

IMO, with the sheer amount of silicon modern processors use for incredibly specialized gains, there's no reason our computers should be specialized for scientific computations when they could support daily computations too.

You obviously do not want decimals on your GPU, but the choice on the CPU is out of wack.

eru · on Feb 8, 2024

Floats seem to be pretty good for running graphics processing. (I would also say AI training and inference, but the machine learning people are constantly experimenting with weird new number formats.)

The 'daily computations' you mention only need a tiny amount of computational power. So it's fair for our CPUs to simulate them via library code.

You should however complain about a programming language that makes it unnecessarily hard to use those 'daily computation' numbers. As a positive example, Python automatically gives you unlimited size integers by default, and only gives you size limited integers, if you explicitly ask for them.

grotorea · on Feb 7, 2024

Do we need support or is just doing it on the CPU without specialized instructions fast enough?

marcosdumay · on Feb 7, 2024

Good question. Today we are incredibly biased into base-2, using it many times where we should be using base-10 instead. So the performance of our current software is not a good example.

But maybe we can get a good example if we just divide our current numbers by instruction types and multiply the FP time by ~100.

I have seen many studies that measure how much time software spends on FP instruction. But unfortunately, we live in a web search dark age, so I don't have references. I remember that FP-heavy code spends 10% to 20% of the time on those, what means that emulation will increase your execution time up to something on the 10x - 20x range. That's perfectly fine for a lot of applications, but does limit several niches.

But as far as I know, the question of how FP-heavy is the FP-heavy software that does daily calculation is unanswered.

eru · on Feb 8, 2024

What kinds of applications do you have in mind that are (a) CPU bound (or GPU bound), and (b) would benefit from your 'saner' numbers?

Eg graphics in general or video decoding specifically is definitely something that does a lot of processing; but I don't see how it would benefit from eg base-10 numbers?

Spreadsheets might benefit from base-10 numbers by default; but they are not CPU bound at all. Mostly they just sit around and wait for user input.

wruza · on Feb 8, 2024

ERP is all about numbers, and lots of them. While you may rightfully argue that it's a job for SQL servers, the lack of fast base-10-exp numbers in apps creates a very inconvenient and performance-bound gap in development of these systems.

Also, ieee754 floats usually leak into most general purpose scripting languages from the fast-ish category, cause there's no hardware alternative. Two main issues with this are: 1. people who do scripting may be unaware of numeric issues related to base-mismatch, 2. "domestic" calculations and formatting still become cumbersome for those who know. Another issue is that to properly build decimals in (think native + - / * operators), you have to modify the language model around that, and it comes with all sorts of prices.

In short, people want the results to compare correctly to their pocket calculators. "A human number is [+-]<decimals>.<decimals> in a ~20-digit-max window" is an idiom everyone is familiar with and all business requirements assume that by default.

marcosdumay · on Feb 8, 2024

I guess all of the potential applications take the form of spreadsheet or ETL number crunching.

A lot of those go undone because of the costs, or live with wrong results because they are only viable in binary.

Some database queries are also CPU bound, and those are overwhelmingly about decimal numbers.

Symmetry · on Feb 7, 2024

Decimal floating point is a standard that some people use and which even has hardware support in some IBM cores.

https://en.wikipedia.org/wiki/Decimal_floating_point

eru · on Feb 8, 2024

You can do that, but I think for most applications where you want decimal, you can get away with fixed point computation?

Eg if you want to calculate with money, just specify everything in terms of cents (or 0.001 of a cent or whatever), and then use an integer data type. Perhaps even an unlimited size integer data type.

wruza · on Feb 8, 2024

And don't forget to adjust after multiplication, e.g. ($ * kg) = (price * qty) / (10 ^ qty_scale).

But not before you finished with a "sequence-point" in calculations, cause otherwise rounding will accumulate, e.g. ((price * qty) * (100 * (10 ^ pct_scale) - discount)) / (10 ^ (qty_scale + pct_scale)).

Thus no operator[+-*/%^] magic for you either. Also, all these temporary ...0000000s will bite into your "mantissa" part along the way. Fixed point sounds reasonable until you have to actually implement a sales/debt/tax/trade-related routine that takes a couple of pages with proper decimals alone.

afiori · on Feb 8, 2024

I feel like at some point the performance hit of using actual rationals (pairs of bignums) makes sense

adgjlsfhk1 · on Feb 8, 2024

The downside of rationals is that they degrade extremely quickly as soon as you take a sqrt or do trig. Decimal floating point has the advantage of losing accuracy in the places people are used to.

afiori · on Feb 8, 2024

I believe that for almost all rationals trig functions never return a rational so you must have some kind of rounding tecnique.

An advantage of rationals is that you have much more freedom in handling how the rounding works, for example you can limit your rounding to produce fractions n/2^m and then they behave sort of like floats or n/10^m and then they behave like decimals while still being able to know that basic arithmetic operations (+, *, -, /, <, <=, <>, mod, exponentiation by integer) work as precisely as you want them to.

GPUs, ML, an massive numericall processing will want to always use floats, ints, posits[0] with operations that can be efficiently implemented in hardware but most apps could easily afford to use bigint/bigint rationals for most things.

[0] https://spectrum.ieee.org/floating-point-numbers-posits-proc...

tialaramex · on Feb 7, 2024

Typically in the unsafe languages under discussion they use the IEEE754 types without even a first thought. For example you can expect them to sort such types (you can define rules for sorting them, but if you just didn't in C too bad your program has Undefined Behaviour, while in C++ today it's even worse you've silently not written a C++ program at all and it never had any defined behaviour even before it tried to execute your invalid sort)

gpderetta · on Feb 7, 2024

std::sorting floats is perfectly well defined in C++ as long as you respect the precondition that there are no NaNs in your range. In other words the subset of floats without NaNs respects strict weak ordering.

edit: when using the default comparison. With a custom comparator you can do whatever you want, including using std::weak_order to sort NaNs.

tialaramex · on Feb 7, 2024

std::sort is defined in terms of a semantic requirement Compare - a constraint on types - which forbids types which don't have Equivalence. Specifically, our type needs to have some equivalence predicate equiv(A, B) such that equiv(A, A) is true, and the provided comparisons for floating point types in C++ do not meet this requirement.

I would be astonished if you're not correct at runtime in practice of course, but "It works in my environment" and "This is correct by the standard" are two different things.

LegionMammal978 · on Feb 7, 2024

From [structure.requirements]/4 (at [0]):

> Requirements are stated in terms of well-defined expressions that define valid terms of the types that meet the requirements.

I'd read this as separating the values of a type into valid and invalid terms, where the valid terms can be any arbitrary set of values that's closed under the required operations. (This is explicitly clarified for concepts in [structure.requirements]/8, saying that the functions and conditions don't have to be total over all values of the type.)

In this case, NaN would be considered an invalid term for std::sort(), so the compiler could break the program at a point only if it could statically guarantee that a NaN will inevitably make its way into std::sort().

[0] https://eel.is/c++draft/structure.requirements#4

tialaramex · on Feb 7, 2024

Staring at that text just gives me a headache, in particular "This does not affect whether a type models the concept".

I'm sure that everybody proposing and voting on this text was sure they agreed with it, however I'm doubtful that they all had the same meaning in mind, and I have no idea which one the author intended.

What is "this" here? The example, of a type which clearly doesn't model the concept it syntactically conforms to? The requirement? All of the preceding text ?

LegionMammal978 · on Feb 7, 2024

> What is "this" here? The example, of a type which clearly doesn't model the concept it syntactically conforms to? The requirement? All of the preceding text ?

Generally, pronouns in the standard are scoped to the paragraph they're in. In this case, I read it as saying that the type (e.g., float) can still model a concept even if the semantic requirements are violated for some particular values. (That is, "this" is "whether the required operation is partial or total".)

To support this reading, we can see from [res.on.requirements]/1 ([0]) that 'modeling' a concept is defined as meeting its associated semantic requirements:

> A sequence Args of template arguments is said to model a concept C if Args satisfies C ([temp.constr.decl]) and meets all semantic requirements (if any) given in the specification of C.

And 'meeting a semantic requirement' is mediated by [structure.requirements]/4, which I read to say that semantic requirements aren't universal rules for all values of a type, but necessary preconditions for values to be valid terms that can be passed into standard-library APIs.

By that logic, even though an operation might not "meet the semantic requirements of [a] concept when operating on NaNs", it doesn't kill the entire type, it just rules out NaNs from being valid terms.

I think the weirdness here is that the set of valid terms of any type is up in flux, to be defined by the ultimate user. But even if the operations don't meet the semantics over that entire set, the compiler has no way to prove that the program is ill-formed until the user actually generates one of the offending values, asserting that it truly is a valid term.

[0] https://eel.is/c++draft/res.on.requirements#1

tialaramex · on Feb 8, 2024

The problem with your interpretation is that it defies causality. The question of whether this is a well-formed C++ program isn't one that can be delayed until runtime, it either was, or was not, a well-formed C++ program when it was compiled - so if your reading requires that we don't know until runtime then it cannot be a correct reading.

Instead the models are about types, not about individual values or groups of values (except in the sense that a type "is" all its possible values). Unlike individual values, the types are known at compile time, so at compile time the types used mean it is or is not a well-formed C++ program in this regard. If we use a type which matches the syntactic requirements but does not model the Concept, our program is Ill-Formed and No Diagnostic is Required, we've written nonsense, we don't win a prize and our program has no defined meaning.

You seem hung up on the fact that we often can't be sure, that's Rice's Theorem, the option we'd prefer (accept exactly the set of programs with the desired semantics) is mathematically impossible, Henry Rice got his PhD for showing that a lifetime ago. If you want non-trivial semantic properties (and in a language like C++ you certainly do) then you have two practical ways to resolve this and get a working language. The first way is the one C++ chose, you can reject some obviously unacceptable programs, but whenever you aren't sure you assume the desired property is satisfied and don't worry about it further.

The other option (which I happen to think is the only reasonable choice) is the one Rust took - only programs which we can show have the desired semantic properties will compile, other programs are rejected. This means the compiler will reject some correct programs, which is somewhat annoying if it happens to you, but effort put into the compiler can reduce the frequency of its occurrence (and it did in Rust, that's what the Non-Lexical Lifetimes change was and what Polonius is all about)

LegionMammal978 · on Feb 8, 2024

> The problem with your interpretation is that it defies causality. The question of whether this is a well-formed C++ program isn't one that can be delayed until runtime, it either was, or was not, a well-formed C++ program when it was compiled - so if your reading requires that we don't know until runtime then it cannot be a correct reading.

Alas, the same issues with causality arise with the idea of "time-travelling UB". Suppose I write a program that prints a welcome message, reads a number from the standard input, then either exits if it's 0, or causes UB if it's 1. By the rules of the standard, if the implementation could presciently predict at the start of the program that I am about to enter 1, then it could skip the welcome message.

But of course, in the real world, causality is a higher law than the rules in the standard: if there's a possibility at any point that UB might not occur, then it must keep functioning as expected. So there's no way it can skip the welcome message, even though the standard allows it.

I see the question of ill-formedness from invalid terms as the same. At the start of each execution of the program, each type that must satisfy a requirement gets an invisible "set of valid terms" with respect to that requirement. However, this set exists only in the mathematical sense, in that it cannot yet be known by anyone, since it depends on future information. This is just like "the number that I will eventually enter" in my last example: it mathematically has some value (assuming that I do eventually enter something), but neither I nor the implementation can know whether it's 0 or 1 until I actually enter it.

Thus, the implementation is once again bound by the possibility that the "set of valid terms" might truly satisfy the requirement. It must execute the program as if it is well-formed, right up until the point where it is no longer possible that it is well-formed. And that point will generally correspond to a particular witness value (e.g., a NaN) that is passed into a standard function but makes an operation fail to meet its requirements.

If you want to put it into properly-causal terms, you could start the program with a really big "set of possible sets of valid terms". Then, each time at runtime that the program passes a value to a function with the requirement, you filter the sets to only those containing the value. If there are no sets left after this, then the program is now known for certain to have been ill-formed all along (there's no possibility remaining), so you can do whatever you want.

Of course, this does assume that a program being well-formed can change between each execution, even with the same source file. But it's hardly the only IFNDR in the standard that depends on behavior rather than syntax.

> The other option (which I happen to think is the only reasonable choice) is the one Rust took - only programs which we can show have the desired semantic properties will compile, other programs are rejected. This means the compiler will reject some correct programs, which is somewhat annoying if it happens to you, but effort put into the compiler can reduce the frequency of its occurrence (and it did in Rust, that's what the Non-Lexical Lifetimes change was and what Polonius is all about)

Since you mention Rust, I'd note that it has the same causality wackiness, just to a far lesser extent than C/C++, in the form of "angelic nondeterminism" (which is a term you can Google). Suppose that you have a pointer to the end of one slice and a pointer to the start of another slice, which happen to be equal, but don't have access to each other's slices due to provenance. Then, you convert both those pointers to integers, and convert that integer value back to a pointer.

Which slice should the new pointer have access to? Since integers have no provenance, the compiler has no way of knowing which pointer the address came from! Thus, the compiler uses "angelic nondeterminism" to decide: defying causality, it picks whichever of the two slices will make the program valid. If neither choice is valid (e.g., the program tries to access both slices from the new pointer), then it's instant UB.

Once again, to do this in the real world, the compiler must maintain a conceptual set of possible pointers that the new pointer might have come from, then whittle them down based on what memory is actually accessed. If there are no possible pointers left, then UB is known.

tialaramex · on Feb 8, 2024

> Of course, this does assume that a program being well-formed can change between each execution

Which can't be right.

> has the same causality wackiness, just to a far lesser extent than C/C++, in the form of "angelic nondeterminism"

It's OK, I don't need to Google, you'll find me in a conversation with Martin Uecker in a previous HN thread about PNVI-ae-udi which is the way C or C++ people would think about the matter.

And actually as usual Rust is ahead of the game here, you say "the compiler has no way of knowing" - actually with Aria's experiment in play we can insist on always telling the compiler what's going on, and as a reward we get both more assurance of correctness because the tools work (Miri for example) and better performance. The experiment is about whether most Rust software can live with this restriction ("Strict Provenance") and the answer IMNSHO is clearly yes.

Rust's pointers get to have member functions (as do its primitive integers and any other types) and Aria's experiment adds the method with_addr() to pointers, so now we can tell a pointer (which has provenance) that we want a different pointer with a different address inside it (but preserving the provenance) via the method.

More pertinently to our discussion, even with PNVI-ae-udi plus a mistake that results in an impossible pointer this is a cause of Undefined Behaviour, which occurs at runtime, it is not a semantic requirement of the language, our Rust program was well formed and correctly translated into machine code if we did this wrong, it's just that it had Undefined Behaviour at runtime as a result of us writing bad unsafe code.

Because Rust defaults to rejecting programs it knows always panic there can be the false impression that such programs are not well-formed. You can just insist to the compiler that you want the results anyway, it will duly spit out an executable which... always panics. Not sure what you want that for, but you're welcome to it.

LegionMammal978 · on Feb 8, 2024

> And actually as usual Rust is ahead of the game here, you say "the compiler has no way of knowing" - actually with Aria's experiment in play we can insist on always telling the compiler what's going on, and as a reward we get both more assurance of correctness because the tools work (Miri for example) and better performance. The experiment is about whether most Rust software can live with this restriction ("Strict Provenance") and the answer IMNSHO is clearly yes.

The standard counterexamples being XOR linked lists, or manual paging with mmap, or manipulating addresses as integers within asm!(), or literally any situation where you can't afford to keep the address as a single value-with-provenance sitting at a known location in program memory until you need it.

Strict provenance is sufficient for most use cases, but far from all of them, so there's no viable way to tear the whole concept of exposed addresses out of the language. In any case, removing it would violate basic expectations of stability, given that users have been converting pointers to integers and back since the very beginning of the language.

> More pertinently to our discussion, even with PNVI-ae-udi plus a mistake that results in an impossible pointer this is a cause of Undefined Behaviour, which occurs at runtime, it is not a semantic requirement of the language, our Rust program was well formed and correctly translated into machine code if we did this wrong, it's just that it had Undefined Behaviour at runtime as a result of us writing bad unsafe code.

My point is that there's no difference in semantics, nor any difference with respect to causality, between an "ill-formed program" that's only ill-formed under certain conditions, and a program with "Undefined Behavior at runtime". The standard ascribes no requirements on the entire execution, start to finish, in either case. But practically, if the implementation doesn't have enough information to distinguish a valid execution from an invalid execution until a certain point (i.e., runtime-dependent UB or ill-formedness), then it must behave properly right up until that point, by the laws of causality.

Still, I agree that just calling requirement violations UB would be far less confusing.

tialaramex · on Feb 12, 2024

> The standard counterexamples being XOR linked lists, or manual paging with mmap, or manipulating addresses as integers within asm!(), or literally any situation where you can't afford to keep the address as a single value-with-provenance sitting at a known location in program memory until you need it.

Well there's a crucial distinction in play here. The XOR linked lists, and some related tricks can be done with address exposure, we expose the pointer and get the integer address, then later when we reconstruct that integer we're allowed to turn it back into a pointer. Rust of course has (as part of the same experiment) APIs for that, you're just paying in terms of worse performance and no tooling (the tooling is disabled if you want these APIs).

But the other cases are just voodoo magic, we don't have a model to explain why they should work, which also makes it very hard to, for example, debug them if they don't. They're platform specific. As with the inline assembler, Rust takes no responsibility for whether you can write a correct program in these circumstances, good luck.

I don't agree that there isn't a practical difference between IFNDR and UB, I see attempts to confuse these as part of a desire to pretend that C++ isn't markedly worse at correctness than competitors, without the hard work to actually stop it from being markedly worse.

gpderetta · on Feb 8, 2024

A well-formed C++ program is one that is "[...] constructed according to the syntax and semantic rules" (4.65).

And of course adherence to many semantic rules in C++ cannot generally be verified statically: "A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation)." (4.1.2.5).

It sucks but that's the way C++ work, and it would be hard for it to be any other way without GC or going full rust.

tialaramex · on Feb 8, 2024

> A well-formed C++ program is one that is "[...] constructed according to the syntax and semantic rules" (4.65).

Yes, my argument is that as a result a C++ program which sorts floats in the standard way is not well-formed. The type "float" matches the syntax rules, but does not match the (unchecked) semantic rules for ordering. This is Ill-Formed No Diagnostic Required, IFNDR.

Not checking is a choice. You mentioned "full Rust" as the alternative here, but lest that be misinterpreted, the other choice isn't to do everything Rust does, only to make the same choice in this specific respect: Rust chooses to reject programs unless it can see that they meet the semantic rules, while C++ chooses to accept programs unless it can see why they cannot.

Rice's Theorem forbids the obviously ideal outcome (accept exactly the set of programs which have the desired semantic properties) but either of these alternatives delivers an implementable system, and I believe of those two options one is clearly unacceptable.

gpderetta · on Feb 8, 2024

> my argument is that as a result a C++ program which sorts floats in the standard way is not well-formed.

I think the argument is flawed: if any program that sorts floats is not well-formed because a float could be a NaN, then any program that dereferences a pointer is not well-formed as a pointer could potentially be null. Do you agree that's obviously not what's intended?

I will grant you that the standard is very much less than clear, but I think generically it uses the term "program" to refer to programs, part of programs or execution of programs.

re: full rust, yes I meant "reject programs unless it can see that they meet the semantic rules".

tialaramex · on Feb 8, 2024

I agree that dereferencing a nullptr is merely UB and your program isn't Ill-Formed as a result.

I distinguish sort because it requires the sorted type to match a bunch of concepts and those are defined such that if your type doesn't match the syntax requirements it just won't compile, but if your type doesn't match the semantic requirements (the "model" of the concept), which are just text in the standards document, well, too bad IFNDR.

I think you can attempt to read the Concepts wording as if it's not about types but groups of runtime values, but I think that gets to the causality problem, where we're deciding at runtime whether our program's text was ill formed before it was compiled.

Obviously C++ 17 sort didn't do any of this, because C++ 20 Concepts didn't get standardised until C++ 20 -- hence my reference to "C++ today". I believe in C++ 17 similar code just has UB.

Edited to add Also: Unrelated to our main topic, I think C++ 17 should have insisted on a Quality of Implementation requirement here which says yeah, if you gave us faulty comparison function, your data does not get "sorted" whatever that could mean, but we're not going to give you Undefined Behaviour. After all C++ did eventually forbid naive Quicksort, which is a QoI decision.

Rust can cheerfully "sort" a bunch of my misfortunate::Always types, which insist they exhibit Total Order but are lying. Obviously after we've "sorted" them they're not "sorted" because that's meaningless, but Rust isn't going to run off the end of a buffer or cause other nonsense, we broke it => we bought it, no more than that.

gpderetta · on Feb 7, 2024

std::sort has a semantic requirement on the type of Compare, but Compare itself only has a constraint on the values it is comparing. From the 27.8.1.3 [1]:

  "comp shall induce a strict weak ordering on the values."

It is not terribly clear what "values" is referring to, but form context it must be all values that are being sorted, as opposed to all possible values of T.

The only explicit constrain on T is that it must be Move Assignable and Constructible.

[1] https://eel.is/c++draft/alg.sorting.general#4

tialaramex · on Feb 7, 2024

I see what you're saying about defining "values" as specifically only the values which, at runtime, we happen to be using Compare on, which is an idea I hadn't considered before - but I'm not sure how to square that with the way the rest of the language is defined.

Maybe one day the proposed "Appendix" of all UB and IFNDR will actually explain clearly what the intended rules are. Corentin Jabot's "UB? In My Lexer?" showed that the ISO document is still very uh, unpolished after decades of effort.

eru · on Feb 8, 2024

Could you explain the difference between C and C++ here? I thought they deal with undefined behaviour in approximately the same way?

tialaramex · on Feb 8, 2024

My argument is that what's happening here in C++ is not merely Undefined Behaviour.

UB is a runtime thing, at runtime your program tried to do this operation we said mustn't ever happen, so all bets are off. C++ also has this idea of programs which are Ill-Formed, No Diagnostic Required, meaning that this was never a C++ program, the ISO document doesn't tell you what this program will do, and a compiler isn't expected to tell you that you didn't write a C++ program in these cases either, so too bad but it's not our fault.

IFNDR is there because these are non-trivial semantic properties, so by Rice's Theorem they're Undecidable, the compiler cannot be sure if every arbitrary program meets the semantic requirement. The ISO C++ document would need to document a check which can be performed, allowing compilers to accept only programs which pass that check, or something similar giving enough rope for them to avoid cases where your program compiles in Clang but not MSVC or vice versa, that sounds awkward to write, instead it just YOLOs the whole problem with IFNDR -- if any program unknowingly does not meet these requirements it was silently never a C++ program anyway, not our fault it doesn't work.

As it has grown over the decades, C++ has added more, and more, and more IFNDR to the text, because it sure is easier to wave away such programs as "That isn't C++" than to either insist compiler vendors write more diagnostics (Boring! Expensive!) or define the language more carefully so as to make the wrong programs impossible to write. They do not have a comprehensive list of IFNDR (nor UB) in their language today.

My belief (and some well known C++ people agree) is that the vast majority of real world C++ programs today are Ill-Formed by these definitions and so do not have and never had any meaning by the text of the standard. I think this is substantially worse than the situation in C.

eru · on Feb 9, 2024

> UB is a runtime thing, [...]

Compare and contrast https://stackoverflow.com/a/11546378

> FNDR is there because these are non-trivial semantic properties, so by Rice's Theorem they're Undecidable, the compiler cannot be sure if every arbitrary program meets the semantic requirement.

As pointed out elsewhere, Rice's theorem does not force this choice. You could also be conservative: insist on only disallowing all 'bad' programs at the price of banning some 'good' programs. C++'s type system already disallows plenty of 'good' programs anyway.

> My belief (and some well known C++ people agree) is that the vast majority of real world C++ programs today are Ill-Formed by these definitions and so do not have and never had any meaning by the text of the standard. I think this is substantially worse than the situation in C.

Thanks! That's the kind of answer I was trying to get!

flohofwoe · on Feb 7, 2024

At that point, why not introduce special i63 and u63 types and use the hidden bit 63 as 'soft carry flag' to detect over/undeflow. 63 bits of integer range ought to be enough for anyone ;)

(as soon as bit 63 is set, the number would be in 'overflow state', but at the same time the lower 63 bits would preserve the wrap-around value in case that's useful).

On the other hand, compilers are already free to check the CPU carry flag bit and trap on it. Those flag bits are just not exposed to programming languages (which IMHO is a serious 'dissonance' between assembly and high level languages), also CPU flags can't be carried around with their associated result value and go 'viral' like NaNs (but whether that's even useful instead of trapping immediately is another question).

jusuhi · on Feb 7, 2024

Because i63 won't help with 32 bit variables.

flohofwoe · on Feb 7, 2024

...which is a bit of a weak reason when there are hardly any 32-bit CPUs around anymore though. Picking a narrower integer type makes sense in specific situations like tightly packed structs, but in that case it usually doesn't matter whether the integer range is 2^31 or 2^32, if you really need the extra bit of range it's usually ok to go to the next wider integer type.

jusuhi · on Feb 7, 2024

Feel free to generalize my argument to i31 and i15.

Also, look around you and realize that non-64-bit microprocessors outnumber 64-bit machines by orders of magnitude. There is more to the world than your laptop and smartphone. (Just within 1m of myself at this very moment I see 4 devices with a 32 bit microprocessor in them, and I'm sitting in a coffee shop reading a book. Well, and browsing HN now and then.)

flohofwoe · on Feb 7, 2024

There's actually a better solution if you allow the compiler some wiggle room:

Zig and C23 (and probably a couple other languages) support arbitrary width integers, so integer types like u15, i3, i1 or even u512 etc... are something entirely 'normal', and even useful in day-to-day coding when mixing signed and unsigned integer math (for instance Zig disallows to assign an unsigned integer to a signed integer of the same width, but it's perfectly fine to assign a 7-bit unsigned integer to an 8-bit signed integer).

Down on the CPU such operations happen on the register width, and if the integer width exceeds the register width, are unrolled into a chain of op-with-carry instructions (same way we did 16- or 32-bit math back on 8-bit processors).

saagarjha · on Feb 7, 2024

Surely your laptop and smartphone have several 32-bit microprocessors in them.

Findecanor · on Feb 7, 2024

Pretty much all "64-bit" CPU architectures out there also have specific instructions for 32-bit arithmetic.

They were designed that way partly on purpose, to cater for the enormous amount of existing program code that has been written with the assumption that an `int` is 32 bits. A large portion of that code also written with the assumption that they are also 2's complement and wrapping.

femto · on Feb 7, 2024

DSPs already do something like this, offering the option of saturating arithmetic operations, in addition to the normal modulo arithmetic. It's the operation that's special, not the representation in memory.

The normal way to do it is to use the "ETSI Basic Operator" library, which provides saturating arithmetic operations, among other things [1]. If the hardware supports saturated arithmetic the compiler maps the operations to the relevant instructions, otherwise it provides a software implementation.

Example:

Modulo arithmetic: int16_t a,b,c; a = b + c;

Saturating 16-bit arithmetic using ETSI: int16_t a,b,c; a = add(b, c);

[1] https://www.itu.int/rec/T-REC-G.191/en

Stenzel · on Feb 7, 2024

Processors with saturating capabilities usually clip at 0x7FFFFFF and 0x80000000, so the clipping thing has no relation to the original topic. But in fixpoint DSP, 0x8000 (add trailing zeros if you like) is often the only possibility to achieve a multiplicative identity by using a = - b * 0x8000, with the negation part of the assembly instruction.

tialaramex · on Feb 7, 2024

This is what nook:Balanced{I8,I16,I32,I64} are for. https://crates.io/crates/nook

Of course today in Rust we're not allowed to bless such types for ourselves, this is reserved to the standard library (core::num::NonZeroU32 is a 4-byte unsigned integer which can't be zero for example) because it's relying on a nasty permanently unstable compiler-only feature to say "Hey, not all possible representations the right size for this type are valid".

Rust has one very obvious set of types with such a niche, the references &T, but it also has not only the particular custom integers NonZero{U8,U16,U32,U64,I8,I16,I32,I64} and the Non-null pointers, but OS specific types (for Unix file descriptors, and Windows handles respectively) and of course char, it is 4 bytes but Unicode is explicitly limited to far fewer values, so that's a niche too.

But today making more is not allowed in stable Rust, so while my nook crate is an interesting curiosity (and some day I'll implement integer math traits etc.) it cannot be stabilised and I was persuaded that the path forward is a whole lot of work to deliver the Pattern Types feature which can do this stably.

kibwen · on Feb 7, 2024

> permanently unstable compiler-only feature

Arbitrary niches aren't intended to be permanently unstable (at least not in the sense of explicit implementation details like lang items and intrinsics), people have expressed a desire to stabilize them for years (as discussed during the original nonzero int types RFC). If something has changed in that regard, I'd appreciate a link to a discussion.

tialaramex · on Feb 7, 2024

The niche concept isn't permanently unstable, but the present mechanism is. Today the only way to make such a type is to use the magic attributes

#[rustc_nonnull_optimization_guaranteed] #[rustc_layout_scalar_valid_range_start] and #[rustc_layout_scalar_valid_range_end]

As their names hint, these are permanently unstable compiler-only type hints.

I asked that some means be found to stabilize some equivalent hint (presumably with a name that doesn't look like a compiler-only feature) in one of the long series of tickets spawned after 3334 and I was persuaded that this is a terrible idea and won't happen.

Instead the proposed path forward is (at least was then) Pattern Types. Build at least an MVP of the full blown Pattern Types feature instead of a custom attribute.

kazinator · on Feb 7, 2024

> should be a trap representation

In a new, green field programming language design: maybe.

You will have FFI interop issues: what if some foreign gives you such a value, which is correct in its domain. It will have to be specially handled and widened/promoted to a type that can hold it. It looks like friction, though.

In an existing language, like C? Making currently correct programs blow up with a trap exception is a nonstarter.

I think there is no hardware support for treating 0x80... as a trap, so if you actually want it to trap, rather than leaving it at undefined behavior, your compiler has to emit check code, at least in code paths where it is not obvious whether the value is or isn't a trap.

Also, it would be strangely inconsistent for the 0x80... to predictably trap wherever it occurs, but arithmetic overflow to have undefined consequences. The two want to be tied together.

UncleMeat · on Feb 7, 2024

Null is a valid value for pointers, but it is absolutely ordinary and normal for functions to refuse to accept null on the pain of crashing and burning.

kazinator · on Feb 7, 2024

Null being a valid value means it is not a trap representation.

Functions in C code bases checking for null is the rare exception rather than the rule.

When they do check, that doesn't turn null into a trap representation.

jcalvinowens · on Feb 7, 2024

No it isn't. In my experience, there are few higher quality signals a codebase is utter garbage than having `if (arg == NULL)` checks at the top of every function... Almost nothing in libc does that, it is entirely unnecessary.

I realize that wasn't your point, but this is an enormous pet peeve of mine...

UncleMeat · on Feb 7, 2024

A dereference is an implicit "I will crash and burn if you hand me null" contract.

jcalvinowens · on Feb 8, 2024

Yes, exactly. Nobody should be surprised that calling `strlen(NULL)` crashes. Checking for NULL at the top of strlen() would be stupid, which is why it doesn't do that. I've seen a lot of so-called "enterprise" code does that, and it's absolutely idiotic.

UncleMeat · on Feb 8, 2024

Okay. But that's totally unrelated to my original point, which is that we are already very used to functions that do not accept the full domain of a given type. "I don't accept nullptr" is not too fundamentally different from "I don't accept "0x8000..."

kazinator · on Feb 9, 2024

"I don't accept {nullptr, 0x8000}" is fundamentally different from "{nullptr, 0x8000} is a trap representation".

If it's a trap representation, you can't even pass it to me so that I could accept or reject it.

dkjaudyeqooe · on Feb 7, 2024

This is a simplistic view. I implement nil/null values everywhere including integers and at the external border there is a test for that special value. It's not like you're using this sort of thing for just another number so it's natural that there is an adaption.

owlbite · on Feb 7, 2024

Also bitfields / any application using the hi bit as a flag for something or other is going to break now.

kelnos · on Feb 7, 2024

> You will have FFI interop issues: what if some foreign gives you such a value, which is correct in its domain. It will have to be specially handled and widened/promoted to a type that can hold it.

That's more or less what you have to do in Java if your FFI is passing you an unsigned type. You need to use a 64-bit signed long in Java to hold a uint32_t, and a BigInteger to hold a uint64_t.

I mean, you can certainly store that unsigned 32-bit int from the FFI call in a 32-bit signed Java int, but then you have to use special methods on the Integer class to do unsigned comparisons and arithmetic.

Agreed, this is a lot of friction, but there's long-standing precedent, at least.

simiones · on Feb 7, 2024

Even if the current INT_MIN were redefined to be a NaN or a trap, that would do almost nothing to prevent signed integer overflow bugs. Sure, INT_MAX + 1 would be an error, but INT_MAX+2, INT_MAX+100 and so on would still be equal to INT_MIN + 1, INT_MIN + 99 etc.

Besides, -fwrapv was basically the default in most C compilers (and probably C++ as well) before the last 10 years or so. Many large systems thus use it today as well (e.g. the Linux kernel, I believe).

trealira · on Feb 7, 2024

> Besides, -fwrapv was basically the default in most C compilers (and probably C++ as well) before the last 10 years or so.

It seems like people were complaining as early as 2007. Richard Biener (who's linked in the thread link below) said that GCC 2.95.3, which came in 2001, optimized away signed overflow checks as well.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475

---

> signed type overflow is undefined by the C standard, use unsigned int for the

> addition or use -fwrapv

You have GOT to be kidding?

All kinds of security issues are caused by integer wraps, and you are just telling me that with gcc 4.1 and up I cannot test for them for signed data types any more?!

You are missing the point here. There HAS to be a way to get around this. Existing software uses signed int and I can't just change it to unsigned int, but I still must be able to check for a wrap! There does not appear to be a work around I could do in the source code either! Do you expect me to cast it to unsigned, shift right by one, and then add or what?!

PLEASE REVERT THIS CHANGE. This will create MAJOR SECURITY ISSUES in ALL MANNER OF CODE. I don't care if your language lawyers tell you gcc is right. THIS WILL CAUSE PEOPLE TO GET HACKED.

I found this because one check to prevent people from getting hacked failed.

THIS IS NOT A JOKE. FIX THIS! NOW!

simiones · on Feb 8, 2024

Thanks for looking into the history! Interesting to see that this wrong-headed behavior has now been around in GCC for longer than the status quo before.

Other compilers are different though - MSVC only seems to have started optimizing around this in ~2016 or so, icc seems to have done something around this in 2020, and I'm sure many embedded compilers ignore it even today. I couldn't find anything definitive on clang, but given its history I assume it probably always enforced this mistake.

I'd also note that several major C programs only work when signed integer overflow is defined: I mentioned Linux previously, but Python is another one. And Chrome and Firefox are considering enabling -fwrapv and equivalents.

trealira · on Feb 8, 2024

Yeah, you're definitely right that a lot of C and C++ code is written assuming signed overflow won't be optimized away. I think it was a mistake of GCC to do this optimization without some explicit flag like "-fremove-signed-overflow". I wish they didn't do that, but it's not up to me what they do. It's necessary for optimizations like loop unrolling, auto-vectorization, and compiling "for (int i = 0; i < N; i++)" as if it were written "for (size_t i = 0; i < N; i++)" when it's more efficient to do so, but again, I think this should be opt-in, and either -fwrapv or -ftrapv should be default on the generic -O1, -O2, -O3 levels.

drhagen · on Feb 7, 2024

R does this [1]. It is the sentinel that it uses internally to represent a missing value (i.e. NA) in integer arrays. Integer operations overflow to this value.

[1] https://stackoverflow.com/a/56507840/1485877

torstenvl · on Feb 7, 2024

Wait so we're going to give up -fwrapv (a UB mitigation) for the explicit purpose of introducing more undefined behavior?

I don't see any way this change wouldn't lead to an enormous number of bugs, all for the (dubious) goal of having the range of signed integers be symmetrical.

saagarjha · on Feb 7, 2024

-fwrapv is an UB mitigation that often just hides bugs and consequently creates more UB down the road, in places where it is less easy to diagnose. For example identifying an integer overflow is quite easy but placing bounds checks on all allocations in an ABI-preserving way is much more difficult.

atq2119 · on Feb 7, 2024

I think the idea is that overflowing would be a guaranteed trap of some kind, i.e. defined behavior.

eru · on Feb 7, 2024

You don't have to introduce more undefined behaviour. A trap could be very defined behaviour.

torstenvl · on Feb 7, 2024

No. What? No. A trap is by definition undefined behavior.

"Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undeﬁned. If such a representation is produced by a side effect that modiﬁes all or any part of the object by an lvalue expression that does not have character type, the behavior is undeﬁned."

akoboldfrying · on Feb 7, 2024

I think the idea would be to redefine "the object type" to make the trap value valid -- and to define the result of all operators on the trap value in some sensible way (similar to non-signalling NaNs in IEEE-754 floating point), so that, e.g., adding the trap value and any number produces the trap value again.

torstenvl · on Feb 7, 2024

It is not possible for a trap representation to be made valid. A trap representation is one where getting it or setting it, other than as a raw byte, is undefined behavior.

Section 6.2.6.1, paragraph 5, of C99, C11, C17, and—with a terminology change—the final draft of C23 are crystal clear on this.

atq2119 · on Feb 7, 2024

I think you may be too hung up on the C(?) definition of trap representation.

Clearly the OP suggestion isn't viable for default C integers anyway, and whichever language OP's suggestion is applied to is free to define the behavior of trap representations.

marcosdumay · on Feb 7, 2024

It's only undefined behavior (on C) because the (C) standard body decided to specify it that way.

There's no reason to specify things as UB as it's currently understood.

Gibbon1 · on Feb 7, 2024

A trap is something the programmer should have to ask for not something the compiler writer thinks is suitable punishment.

planede · on Feb 7, 2024

-ftrapv is an implementation-provided flag that defines the behavior of some operations that are otherwise not defined by the language standard. The implementation could define the behavior (under the same or different flags) to this newly introduced trap representation as well (no need to actually trap).

Having said that this should be a new type, adding more UB to existing types is a bad idea.

gpderetta · on Feb 7, 2024

you trade -fwrap for -ftrap, which is IMHO better. You would need hardware support for it to be even half efficient though.

I don't hate the idea, but doing it for existing int types might be a non-starter due to backward compatibility.

kevin_thibedeau · on Feb 7, 2024

Computers are implemented in hardware, not maths proofs. The Java generation is infatuated with magic numerics without realizing their performance cost.

lxgr · on Feb 7, 2024

What’s the “Java generation”, and why is premature optimization strictly preferred over sacrificing performance over correctness?

x86 offers hardware traps for integer division by zero while PowerPC returns zero – does that make x86 more wasteful/a “Java generation CPU”? What about CPUs that support native wrap trapping?

AdamH12113 · on Feb 7, 2024

I think the GP's point is that this would need hardware support. Without that, it's not something you could implement efficiently as a programming language feature -- you'd have to "trap" by checking the result of every arithmetic operation in software.

lxgr · on Feb 7, 2024

Python does just that for integers (since it automatically casts up to bigints instead of overflowing), as does Swift, and that's arguably fairly low-level/"compiled"! It doesn't strike me as an absurd idea.

Compilers could optimize for common cases where overflows are guaranteed to not happen, or perform the "isn't b100..000" check at the entrance to an inner loop and defer to the hardware's native overflow detection capabilities (if any).

trealira · on Feb 7, 2024

Lisp has converted fixnums (hardware integers) to bignums on overflow since the 1970s, too, so it's not just the "Java generation."

Their words:

> Computers are implemented in hardware, not maths proofs. The _ generation is infatuated with _ without realizing their performance cost.

This sounds just like the assembly programmers arguing against high level languages, who were against the overhead of unlimited named variables and recursive function activation records.

We should optimize for writing a lot of correct but slightly inefficient software. By keeping high-level language compilers (rather than assembly language programmers) in mind for the design of newer ISAs, we already have, to some extent.

That said, I still don't know about making the lowest signed number a trap representation in hardware. We already usually have a flag to detect signed overflow; why not just check that?

lxgr · on Feb 7, 2024

> That said, I still don't know about making the lowest signed number a trap representation in hardware. We already usually have a flag to detect signed overflow; why not just check that?

TFA mentions efficient representation of optional/nilable integers in parameters or return values as another benefit, which I do have some sympathy for (supplying MIN_INT or MAX_INT as an effective "nil" value has always felt wrong to me). I'm not sure if a hardware trap is the right mechanism to handle that, though.

nine_k · on Feb 7, 2024

Floating-point numbers are also directly supported by hardware, but they carved a few values for NaN, negative and positive infinity, and negative and positive zero.

I don't see why integers should not have something similar, especially large enough integers, like i64 or even i32. Historically they don't though.

GrumpySloth · on Feb 7, 2024

Floating point numbers also don’t obey normal algebra laws and are notoriously tricky to get right.

Two’s complement on the other hand form an honest-to-god commutative ring, their implementation is dead simple and it’s very rare that they lead to bugs in and out of themselves. They’re one of the most beautiful solutions in computer science IMO.

I’d give up floating point much more readily than two's complement.

usrusr · on Feb 7, 2024

I'd say the only "generational" part is that in this example the question has made it to the web. Who wouldn't wonder, when first leaning about two's complement, if unsigned plus sign (one's complement) might be preferable? I'd rather question the mind of people who never went through that phase. And the people who, in the early days, opted for one's complement for the hardware they designed certainly did not do so because of being too far from the metal. Both representations coexisted for a while, sometimes within the same system.

sham1 · on Feb 8, 2024

For slight pedantry, "unsigned plus sign" is the sign-and-magnitude representation, while one's complement negation is taking the bitwise NOT of the number in question.

saagarjha · on Feb 7, 2024

Why do you think the author of this is unaware of the performance cost?

silvestrov · on Feb 7, 2024

Rather than a trap representation I'd like a real trap, i.e. exception at runtime that can be caugth.

Propagated NULL is useful in SQL but likely much less for integers as they are used as indexes. It is difficult to say when the "NaN" should result in a trap when doing pointer arithmetic using such numbers as the CPU does not have seperate pointer/integer data types.

There are so many security problems from integer overflow. Also some easy way of checking an integer is in the range [0...n[ as a single instruction that traps if it is not.

adgjlsfhk1 · on Feb 7, 2024

the version of this I've really wanted is hardware support for integers mod 2^64-1 with 0x1000000000000000 as a nan value. the addition and subtraction and division would be trivial. multiplication would be a tiny bit more complicated, but should be relatively doable.

there are 3 main advantages with this system

1. ability to properly represent zero division or overflow without flags/exceptions 2. abs(x) is always greater than 0 3. a mix of these and regular Ints could be used with the Chinese remainder theorem to provide faster (almost) 128 bit math.

adgjlsfhk1 · on Feb 7, 2024

I would also be in favor of doing this for floating point (get rid of +-Inf, and -0.0 and make the only NaN value be the bitwise negation of 0.0).

snickerbockers · on Feb 7, 2024

This person's mind is going to be blown when he learns about bitmasks.

gregfjohnson · on Feb 7, 2024

Here is my favorite example of errors due to overflow:

std::vector<int> vec;

// test for sorted..

bool sorted = true;

for (size_t i = 0; i < vec.size() - 1ULL; ++i)

    if (vec[i] > vec[i+1]) {

        sorted = false;
        break;

    }

I believe the google c++ style guide rues the selection of unsigned integers as the return type of size() in the standard library for reasons like the above. Personally, my preferred behavior would be to have a compiler flag that could be used to enable a trap if unsigned arithmetic resulted in wrap-around, as in the above case when the vector is of length zero.

planede · on Feb 7, 2024

> Personally, my preferred behavior would be to have a compiler flag that could be used to enable a trap if unsigned arithmetic resulted in wrap-around, as in the above case when the vector is of length zero.

That probably has the potential of having too many false positives. It's also very non-confroming, so I see why compiler vendors would be reluctant to add such a flag.

jcelerier · on Feb 7, 2024

> compiler flag that could be used to enable a trap if unsigned arithmetic resulted in wrap-around, as in the above case when the vector is of length zero.

I mean.. this is exactly what -fsanitize=integer does in clang. I never saw a signed overflow UB, but how many "perfectly defined" yet horrendously wrong cases like this one this sanitizer has helped me find...

gregfjohnson · on Feb 7, 2024

Thanks so much - I was unaware of this clang flag. I just confirmed, it does exactly what I was hoping for!

I suggested that this be an optional flag, not the default behavior. I could see using it for unit-test/static-analysis builds where performance is not the main concern. (I have not looked at the generated assembly or done a performance comparison, but it might be that the -fsanitize=integer flag would slow the generated code down.)

jcelerier · on Feb 9, 2024

Anything that starts with -fsanitize is really only meant for debugging, definitely not production.

planede · on Feb 7, 2024

A trap representation is not a NaN-like sentinel value. The post also didn't discuss the updated semantics of integer operations, but I assume it just simply shrinks the range of representable values by removing -2^(N-1), and operations with a resulting value that lie outside are still undefined.

Having a NaN-like sentinel with is another option, with overflowing integer operations defaulting to that value instead, but that's not what is proposed here.

The post does mention optional<int>, which could have the NaN-like sentinel (empty optional) by reusing the bit-pattern of the trap representation of the underlying int.

lifthrasiir · on Feb 7, 2024

I would welcome something like sNaN but actually fully enforced and widely implemented in hardwares. Otherwise it would be just a pipe dream.

nynx · on Feb 7, 2024

Please no. A process dying automatically and immediately dying because of an arithmetic overflow would be terrible.

saagarjha · on Feb 7, 2024

Yes, it's much better when processes limp on with bugs that are difficult to diagnose and often exploitable.

Dylan16807 · on Feb 7, 2024

You can have a handler.

But are you sure exiting is more terrible than blindly forging ahead with a known-wrong number?

nynx · on Feb 7, 2024

Who said overflow was wrong? The sort of code I work on has to continue running, no matter what. Continuing on is always better than crashing, at least in my context.

It’s also basically impossible to establish invariants about a system for code that is run in signal handlers

lxgr · on Feb 7, 2024

And I’ll take a crash over a silent wraparound any day.

Out of curiosity, what domain is that? I can’t think of many use case where silent wrapping wouldn’t likely cause problems down the road.

torstenvl · on Feb 7, 2024

There are almost no circumstances where a crash in production code is preferable.

Can you imagine working on a complicated Excel document when you run into a 2038 bug and you lose all your work instead of just having one of your future dates calculated as being in the 1970s?

Or flying a plane and the system has to reboot midair instead of saying you're at -32768 feet of altitude?

Or there are now 2.15 billion virus definitions, so your entire security suite just stops running?

Or your Tesla stops keeping you in your lane without warning because your odometer turned over?

Most things are not Therac-25. Much more software is protecting us and our data than is exposing us to danger. Loss of data, loss of navigation, loss of safety systems or life support... simply unacceptable. Turn on -ftrapv when testing and especially when fuzzing, sure, but production code should almost always be built with -fwrapv.

gizmo686 · on Feb 7, 2024

Excel maintains a recovery file of your open document. If it crashes, you can reopen Excel and will likely recover most if not all of your data. You are far more likely to corrupt data by keeping the files (primary and recovery) open in a program that is in a known-bad state.

If software crashes, it can simply be rebooted. I can't get the actual requirements for airplane control softward without paying hundreds of dollars. However, I would imagine that a reboot finish quicker than a pilot could open the manual to the "negative altitude indication" section.

lxgr · on Feb 7, 2024

> Or flying a plane and the system has to reboot midair instead of saying you're at -32768 feet of altitude?

This is exactly how Airbus's computer control philosophy works: Bail out in case of inconsistencies/unexpected errors. In the best case, this means falling back to the other (one or two) still operational redundant instances of the same component. The remediation is to literally reboot the errored-out component manually, mid-flight, with absolutely no adverse consequences. For all you know, this has already happened on a flight you've been on!

In the worst case (which can happen if it's an internally-consistent error condition, e.g. something like a conceptually unforeseen condition that none of the equivalent system instances can handle), the entire system declares failure and bails out to a lower-level system, if needed all the way down to mechanic/hydraulic connections to the most critical control surfaces in case of e.g. the A320. This can mean that e.g. the pilots need to start flying the plane manually.

Flying manually is preferred over e.g. one of two systems thinking that the plane is upside down, with a compromise action of rotating 90 degrees along some axis.

gpderetta · on Feb 7, 2024

> Can you imagine working on a complicated Excel document when you run into a 2038 bug and you lose all your work instead of just having one of your future dates calculated as being in the 1970s?

if your program loses data on a crash it is not worth using. What is this, 1970?

> Or flying a plane and the system has to reboot midair instead of saying you're at -32768 feet of altitude?

I'm pretty sure that's exactly what happens in fact: a quick reboot temporarily handling control to lower level or redundant controls is better than feeding wrong information to a sophisticated flight control system. I think this is generally true for all high reliability control systems.

astrange · on Feb 7, 2024

Crash-only software works for Erlang, and I think people expect telephony switches to be reliable. (Swift also crashes on overflows and runs your phone.)

Most of your examples would be just as bad if they started having unbounded incorrect behavior, and an overflow you didn't know about could lead to that. So don't get the math wrong!

cesarb · on Feb 7, 2024

> (Swift also crashes on overflows and runs your phone.)

AFAIK, over half of the mobile phones in the world, and probably nearly all of the fixed phones, do not run Swift, but instead some older language which does not have this "crash on overflow" behavior.

lxgr · on Feb 7, 2024

Many landline and mobile phone switches do run on Erlang. Not getting a connection in case of e.g. an integer overflow for "pick the next available trunk line" is preferable to kicking out an existing connection on line 0.

Note that an exception doesn't need to literally mean "the entire system comes to a screeching halt": You'd usually limit exception handling to the current operation.

torstenvl · on Feb 13, 2024

Nobody is talking about exceptions. We're talking about intentionally invoking undefined behavior.

You don't use trap representations to handle potential overflows. You check for possible overflow prior to the operation, and then handle appropriately.

quickthrower2 · on Feb 7, 2024

The plane example. Better to have software fault, stop and let redundant system take over. Which might be the pilot looks at another gauge. Pilot logs problem and gets looked at.

UncleMeat · on Feb 7, 2024

I totally disagree. Overflow almost certainly represents a bug and the violation of various invariants. Your code is doing something, even if it isn't giving radiation doses. Computing the wrong data and storing it can cause problems for customers, modify data you didn't plan on touching, or render data entirely inaccessible if various encryption operations are performed incorrectly.

Just like most things are not the Therac-25, most things are also not safety critical when they crash. Some web service that crashes just means that it returns 500 and you see it on whatever stability dashboard.

torstenvl · on Feb 13, 2024

Web services are a great example. Explain to me why it's preferable to give a constant 500 and lock me out of my account, rather than having a fucked up background image scaling or whatever else your web service would be doing with arithmetic.

nynx · on Feb 7, 2024

This is exactly right.

nynx · on Feb 7, 2024

I write software for spacecraft. I’d much rather take my chances that the overflowed value is meant for humans or for some tertiary system than to end the mission regardless.

bvrmn · on Feb 7, 2024

Oh shi, not very reassuring knowledge spacecraft SW engineers allow signed overflow in their code.

Am4TIfIsER0ppos · on Feb 7, 2024

I would guess they don't use a language where signed overflow is undefined. I use C all day every day and that's gotta be one of the worst legacy artefacts of it. As someone else said "it isn't 1970" the system is twos complement.

bvrmn · on Feb 7, 2024

Defined silent signed overflow (wrapping) is worse then UB. For UB there is at least runtime diagnostic. Parent clearly ok with `-fwrapv`.

Dylan16807 · on Feb 7, 2024

> For UB there is at least runtime diagnostic.

I have no idea what you mean by this. A couple types have diagnostics, but so do overflow and wrapping. Most don't. In general, it's much easier to have runtime diagnostics for arithmetic than for UB.

I guess if I squint and "silent" implies you're not allowed to install diagnostics, then it's worse? But that's so artificial of a comparison as to be silly.

Dylan16807 · on Feb 7, 2024

Even if the system is definitely twos complement, it's bad for i+1 to sometimes be less than i.

lxgr · on Feb 7, 2024

Unmanned spacecraft, hopefully? "Taking chances" is not a thing that I'd like the designers of a system that controls the vessel I'm on (or that overflies my house) routinely do during development.

Undefined behavior can transitively expand to the entire system very quickly. For any subsystem, the proper error handling for an overflow/arithmetic error that's not properly handled right in the place it occurs is to declare the subsystem inoperative, not to keep going.

If that approach routinely takes out too many and/or critical subsystems, you likely have much larger problems than integer overflows.

quickthrower2 · on Feb 7, 2024

Neural nets?

eru · on Feb 7, 2024

> Continuing on is always better than crashing, at least in my context.

In that case, you can just detect a crash (eg with an external watchdog), and replace your process with one that runs fizzbuzz.

Fizzbuzz is wrong, but so would be continuing-no-matter-what-happened in general.

MathMonkeyMan · on Feb 7, 2024

There are the kinds of defects that people notice, and then there are the kinds of defects that people don't notice. Crashes are usually noticed, while nonsense calculations and undefined behavior are sometimes not noticed.

I worked at a company that sold a product that some customers used to make decisions involving a lot of money. A program misbehaved, or gave the wrong answer, or returned from a function too soon, or something. It didn't crash. A customer lost a lot of money, and we were liable.

It turned out that the bug was a symptom of undefined behavior that would have been caught by one of the flavors of assertions built into standard releases of the core C++ libraries. The team in charge of the app in question had long ago disabled those assertions because "they keep crashing the code."

Infrastructure within the company then went through a small internal crisis about whether to force assertions down people's throats or to let it be wrong on prod. The compromise was to have the assertions expand to loud warnings on prod. Ops then had a tool that would list all of the "your code is provably broken" instances, and that went to a dashboard used the shame the relevant engineering managers.

Not sure if it worked, but when you have a _lot_ of C++ code, a great deal of it is broken and "it's fine." Until it isn't.

Dylan16807 · on Feb 7, 2024

> Who said overflow was wrong?

The word "overflow".

> It’s also basically impossible to establish invariants about a system for code that is run in signal handlers

Signals are a pain but it doesn't specifically have to be a signal.

kelnos · on Feb 7, 2024

Overflow is fine if you're aware of it and have code that either doesn't need to care about it, or can work around it.

Consider protocols with wrapping sequence numbers. Pretty common in the wild. If I increment my 32-bit sequence counter and it goes from 2^32-1 back to 0, that's likely just expected behavior.

nynx · on Feb 7, 2024

Overflow is perfectly well defined in most languages. What could it be other than a signal?

eru · on Feb 7, 2024

Alas, it's not well defined for signed integers in C (and C++) which are still popular languages.

Dylan16807 · on Feb 7, 2024

> What could it be other than a signal?

A normal exception without stack gremlins trying to break everything.

scotty79 · on Feb 7, 2024

All is fun and games until your negative number is suprisingly off by one because somebody, somewhere didn't convert between two complement and "symmetrical" ints correctly

dreamcompiler · on Feb 7, 2024

One of the things I love about Lisp is that i+1 is always greater than i. You can work hard to make that not be true, but that's the default and IMHO it's the proper default.

eru · on Feb 7, 2024

Well, that's the case for signed integers in C as well.

(But only on a technicality: because every program where that breaks is not technically C, and the compiler is allowed to assume this doesn't happen.)

I suspect the Lisps you talk about use arbitrary sized integers by default? Python also has that default. It's a good default. If you want to optimize for size and want an integer of limited size, like a 64 bits or so, you should explicitly specify.

astrange · on Feb 7, 2024

This is dangerous for security, because now your math operations are no longer O(1) and might allocate memory.

ack_complete · on Feb 7, 2024

Might allocate a _lot_ of memory. I've actually seen a case where some fatfingered a comparison in Python as a << b instead of a < b, and allocated gigabytes of memory on a server.

dreamcompiler · on Feb 7, 2024

Side channels are hard to prevent in all languages because of compiler optimizations, out-of-order execution, branch prediction, speculative execution, loop unrolling, eliding operations, register renaming, using the FPU under the sheets for ordinary integers, clock throttling, cache usage, rowhammer, etc.

Modern mostly-hidden, mostly-proprietary architectural execution models are hell for security and side channel prevention.

We need chips from 1984 with Ghz clocks if we want good security.

astrange · on Feb 8, 2024

Oh, this isn't (mainly) about sidechannel leaks. It's about DoS attacks.

nlitened · on Feb 7, 2024

I think, if security is involved, one would still need to do overflow checks in the exact same places, so nothing is gained and nothing is lost.

eru · on Feb 8, 2024

If you want to harder your code against timing attacks, you need to actively work hard, instead of just relying on some operations taking O(1).

(I don't even know whether your processor does regular multiplication and division in O(1)?)

astrange · on Feb 8, 2024

> (I don't even know whether your processor does regular multiplication and division in O(1)?)

Most of them don't, especially division.

MathMonkeyMan · on Feb 7, 2024

Yep, the [numerical tower][1].

Python's int is a bignum, but in Scheme you even have rationals built in. There's a notion of which numeric values are "exact" and which ones are "inexact" (i.e. floating point).

[1]: https://en.wikipedia.org/wiki/Numerical_tower

eru · on Feb 7, 2024

I remember the numerical tower from Scheme. That was an interesting concept. (Though mathematically, you might want something with a more complicated structure than a tower. Eg p-adic numbers would sit below integers, next to rational numbers, but not below them.)

Python has rational numbers in the standard library. They are easy to use there, too.

waqf · on Feb 7, 2024

Every rational number is a p-adic number (though not necessarily a p-adic integer, of course). So doesn't that mean p-adic numbers would be below rationals?

eru · on Feb 8, 2024

Sure, maybe. My broader point is that the tower would branch at some point.

z29LiTp5qUC30n · on Feb 7, 2024

So they want symmetric complement of the ancient Knight Processor where there was NaN encoded so that divide by zero wasn't an exception but just a return NaN.

adrian_b · on Feb 7, 2024

In the past I have also been tempted by this idea that perhaps it would be better to redefine the current INT_MIN to be a NaN.

Nevertheless, after thinking more about it, my conclusion was that this is not useful.

The strongest argument in favor of it is that then an optional integer needs only the same size as an integer.

Against this argument is the fact that in decades of programming I have never encountered the need for an optional integer type which needs for its base integer type all possible values.

What I have encountered very frequently were optional integer types whose base integer type should have been defined as an integer interval a.k.a. integer range, like it is possible in languages like Ada or Pascal.

For such optional integers with the valid values restricted to an interval, the optional type does not require extra storage over the base type, because the NaN can be represented by any value outside the permissible interval.

The second argument for redefining INT_MIN to be a NaN is that this eliminates the possibility of overflows in several integer functions, like absolute value and negation.

This argument is weak, because overflows can still happen in the more frequent arithmetic operations, so most non-trivial programs must still provide means to handle such exceptional cases.

Moreover, the exceptions are avoided in absolute value and the like, by still having a (possibly implicit, i.e. inserted by the compiler) check that an integer value is not a NaN, somewhere before the code that uses negation, divide by -1, etc., perhaps in the invoking function, not in the invoked function to which integer arguments are passed, so this in the best case replaces multiple checks interspersed with the operations with a single check in the beginning, at the origin of the integer data. On the other hand, implicit checks that an integer is not a NaN would have to be added after additions and the like.

The same is available now, in much of the code where integer overflows are possible one can check the range of the input integer data somewhere in the beginning, verifying that there are no values that can result in an overflow, avoiding further checks at each operation.

The redefining of INT_MIN to be a NaN could help by relying on implicit checks added by a compiler in a few cases, but most expressions contain additions, subtractions and multiplications, so the program must still rely either on the compiler adding implicit overflow trap checks after each operation, for which adding the integer NaN does not give any improvements, or in the programmer writing explicit range checks that depend on the expressions that are computed, and for the majority of the expressions the explicit checks would still be needed.

hoseja · on Feb 7, 2024

None of signed integer representations are very good, each has severe drawbacks.

fargle · on Feb 8, 2024

this strikes me as a chesterton's fence situation - "what kind of weird nonsense is this 2's complement where it's asymmetric (yuck) and you can't negate -0x8000"

i seriously doubt the folks who think this is a good idea know enough about the math and computer architecture to understand why in fact it's a terrible idea.

- programming languages are abstractions. but (old joel) leaky ones sometimes. we don't like this, but you have to live with it.

- in this case, the choice might be between reality-leaks such as "integers have limits", "the limits are asymmetric, leading to this negation pseudo-paradox" and a much worse leak such as "why does this spiffy safe integer type run 100,000 times slower than the weird C int"

- actually math itself is a series of ever complicated abstractions. first counting numbers. then ZERO. then NEGATIVE numbers. ACK! now wait until you get to Fields and Rings and Galois Fields, which are very close to the concept of 2's complement integers. i believe that if ignoring the botched C standard that makes signed integer wraparound UB, "normal" 2's complement integers (and unsigned) form a finite ring.

- so what? well even these fancy, and frankly, impenetrable math abstractions are just abstractions. they leak too and aren't "perfect" if you take into account the real universe which doesn't always match the perfect math. doesn't mean it isn't very useful.

- these concepts can be used to reason about the behavior physical things, like the behavior of an array of flip-flops representing a 64 bit number and especially how you can create circuits and simplify them to do things that implement addition, multiplication, etc.

- the problem with a sorta-1's-complement-without-two-zeros idea is it has none of these connections to grounded theoretical basis. so how do you simplify an adder? can you perform algebraic simplifications to it? what is the model for this thing?

- symmetry is in the eye of the beholder. draw a 4-bit signed 2's complement drawing using "clock diagram". it works. now do it with 1's complement or this suggestion which is even worse with an odd, sometimes prime, number of nodes.

- implementation: 2's complement is a very clever hardware technique that lets you interpret N bits as N digits of a binary number. each digit has weight 2^n: 1, 2, 4, 8, etc. except the last digit which has weight -2^n. this simple trick lets the signed and unsigned integers use the same addition, subtraction, and with minor processing, multiplication and division logic unchanged. it's very unlikely that you can implement the suggested scheme at even 1/10th the performance or cost.

- and, yes, i think C really got it wrong calling "overflow" or wraparound UB. if it was defined to wrap, at least it would be theoretically analyzable. you'd still have overflow flaws, but that's provably impossible to avoid with finite sized integers of any design.