I very much believe overflow checking should be on for release builds, and not just debug. Overflow is defined by the language as an illegal operation, but if it's only enabled by debug builds, then it's effectively defined as "somewhat" illegal. I think this gives the community the wrong impression, and a firmer stance should be taken. This is a killer feature, in my opinion.
In practice, for anything which isn't a micro-benchmark, it's a negligible performance hit. For things resembling a micro-benchmark, you can specifically use wrapped-arithmetic operations, or disable it for that module.
I see a strong correlation between people complaining about overflow checking overhead, and people who don't actually care about the safety features.
The operations themselves may only impose a performance hit of 10% or less, but the optimizations that those checks inhibit can easily cascade into a 2x slowdown, which isn't acceptable if Rust wants to compete with C++.
As for your correlation, I care a lot about safety, but I also recognize that the safest language in the world isn't going to pose any real-world improvement if it's not capable of competing with less-safe languages in their entrenched domains.
It starts with 2x like kibwen says. Then says those ops will only be small part of workload or something. Extrapolating lead to 5%. Article goes on from there.
Not sure if it applies here but it matches the claim you quoted of micro vs macro benchmark.
It would be interesting to have support for saturated types as well. u8saturated 200 + u8saturated 200 = 255. That's often desired behavior.
Saturated arithmetic is supported in hardware in SSE. Although not much point if the values need to be constantly shuffled between SSE and scalar registers...
That can easily lead to excessive branch target buffer invalidation and mess up branch prediction.
It might look acceptable in microbenchmarks, with just 1-10% performance loss and be a total disaster in an actual running system.
A mispredicted branch costs about 15 clock cycles. You'll have a lot of those when CPU can't do bookkeeping on all of your range check branches anymore. The range checks themselves might run fast and fine, but those other branches your program has to execute can become pathological. Those, where the default choice for branches not present in CPU branch prediction buffer and branch target buffer is wrong.
15 clock cycles is rather excessive on any modern CPU that's supposed to execute 1-4 instructions per clock cycle! In the worst pathological cases this can mean an order of magnitude (~10x) worse performance.
Of course data flow analysis might mitigate need for actual range checks, if the compiler can prove the value can never be out of range (of representable values with a given type).
Regardless it is important to understand worst case price can be very high, this type of behavior should not be default in any performance oriented language.
To add, to avoid any misunderstanding: the cost is probably within 1-10% in likely scenarios on most real world code.
Pathological case would be code with a lot of taken branches mixed with range/overflow checks.
For range checks, any sane compiler would generate code that doesn't branch if the value is in range. If no branch predictor entry is present, branches are assumed [1] to be not taken. These branches can still thrash BTB (kind of cache trashing) and cause taken branches that fell out of BTB to become always mispredicted, thus very expensive.
[1]: Some architectures do support branch hints, even x86 did so in the past. Modern x86 CPUs simply ignore these hints.
It's very hard for an overflow check to be mis-predicted. On x86, for static branch prediction, it's sufficient to generate code with a forward branch to be predicted as unlikely the first time. From that point, as the branch will never be taken (unless it actually triggers a panic but then who cares), it will keep being predicted as not taken.
A correctly-predicted non taken conditional jump on x86 has 0 latency. In the specific case of overflow, you don't even need to generate a cmp instruction. It basically means that's close to a wash, in performance.
The actual negative effect that could be measurable is disabling optimizations (like composition of multiple operations, as you need to check each one for overflow).
> Overflow is defined by the language as an illegal operation
Is it? What do you mean by illegal here? Overflow can't cause UB madness in Rust like it does in C; the compiler does not optimize loops assuming no overflow.
I know that UB can be useful for optimizations. Where did I assert otherwise?
I'm saying that overflow is not undefined in Rust (the very first section of the blog post says this). I made this same mistake a few days ago and thought it was undefined (and implementation-defined to panic on debug), but I was wrong. Overflow is well defined in Rust.
Many people complaining about overflow checking either work in high performance computing, where it really matters how fast you can add two numbers, or in embedded systems where you can't afford either the extra space for the instructions, or the extra execution time.
I get hammered every time I mention this, but what happens when a primate type overflows should be handled by the type system.
IE, by default overflow checked and is an error. But you can declare a type with something like twos_complement. And the compiler will implement the expected behavior. Or if you want no you can disable overflow checks.
This stuff really should not be global behavior. Because 90% of the time the cost of overflow checks is nil so you should do them.
But the article specifically states that you can enable it globally. Meaning that you can either do it or not do it. And sometimes you want overflow, so you can specifically allow it at a function level.
I think the only sensible thing to do is to have distinct operators for integer arithmetic (where the impossibility to represent a result obviously is an error that should be caught) and wraparound arithmetic that you can use if you actually need wraparound semantics (or where you need the speed and can prove that it does give you the correct result better than the compiler can).
Swift has this and this is a sensible way to go but not the only one.
I personally would prefer integer overflow resulting in NaN like it does with IEEE floating point operations. Regher has written about this option a bit more in [1].
On the other hand, as long as we have no hardware support this is not going to fly.
If we consider current popular hardware as given, want to be as safe as possible while still retaining performance competitive with C/C++ the solution the Rust community chose is a very good one and one I can live with.
Well, I would kindof consider that to be an implementation of that approach, if you think of NaN as an error return. The important part is that you don't return a value that is arithmetically incorrect for the expected semantics, but not recognizable as such by itself.
I think the approach of Rust is stupid because it makes the fast but insecure option the default for all the code that's written without thinking about it too much that isn't in any way performance critical. Most arithmetic operators in code are not performance critical, only a few that are executed often, are, so it doesn't make much sense to make the behaviour that you need in that case the default, when that also is a security risk. And overflow checking only in debug builds is kindof useless from a security perspective.
Having different types instead of different operators also is weird. You don't have "additive integers" and "multiplicative integers" either, where the former's "+" operator does addition, while the latter's does multiplication (which is a form of (repeated) addition, after all!). If you have the same set of numbers to operate on, the obviously sensible thing to do is to have different operators for different ways to map pairs of those numbers to single numbers from the same set.
It's not the same as an error return because the NaN propagates through all subsequent calculations.
In numerical code very often there are no checks for NaN at all.
I'd even go so far to say that a lot of FORTRAN code in use today was written before NaN was introduced.
I/O functions like printf or WRITE treat NaN properly so if something went wrong you end up with the string "NaN" in your output.
In rust the checked_... methods that return an Option come close to the NaN behaviour.
NaN is a value that is returned to indicate that an operation could not be performed in the way that is normally expected. How is that not pretty much the definition of an error return?
Propagating an error return through subsequent computations is just monadic error handling ... which is, well, error handling?
Also, of course, there are NaN checks everywhere, as is evidenced by the error propagation. You can't propagate the sentinel value if you don't check for it. It just so happens that this check usually is implemented in hardware as part of every instruction for IEEE 754 numbers, so it's extremely fast in the error-free case and doesn't cause any code bloat, at the cost of some cycles being spent on otherwise useless error propagation in case an error occurs.
Now, what you are saying isn't exactly wrong--but I think it's much less confusing to think of it in this way.
The correct term is modular arithmetic, which is well established in maths for centuries. And it is properly backed by a really nice and almost dead-simple theory (at least from a programmer's point of view). Is this no longer taught in school?
If you like to emphasize the fact that it's mostly modulo a power of two (2^n), then call it two's complement. However, that overemphasizes the binary representation, which I find most of the time more distracting than helpful.
As a side note, integer division is usually not performed in modular arithmetics, but addition, subtraction and multiplication are.
It's modular arithmetic because of the internals of the processor, not because of some law-of-the-universe.
It could just as easy a processor not allow over- or under- flow operations. However, it's easiest to just truncate the number and essentially do modular arithmetic. That's the difference. Over- or under- flow, checked, wrap-around all refer to the state at which the operation's answer needs more space than it is given. Modular arithmetic and checked instructions are what to do in those situations.
Aside, division in modular arithmetic can be done, but it's not what most people want (hence why usually not performed). 4 / 3 mod 7 -> 4 * 5 mod 7 ( 5 is the multiplicative inverse of 3 mod 7 ( 3 * 5 = 1 mod 7) -> 6 mod 7
> It's modular arithmetic because of the internals of the processor, not because of some law-of-the-universe
I beg to differ. There were various integer representations tried, such as having a separate "sign" bit on integers. But the two's complement finally was used by everyone. So there's something deeply unique to modulo arithmetics that none of the other representations could offer. This is not just about saving a few gates in the circuit.
> However, it's easiest to just truncate the number and essentially do modular arithmetic
The point is, there is no truncatenation step. It's just a side effect. If you build any naive adder circuit, substractor, multiplicator - even if you totally ignore negative numbers, you automatically have a full working two's complement, that works for negative numbers without any additional effort.
And why is that? Because the mathematical structure behind it is so dead-simple that it almost works by miracle. (except for division, which is highly non-intuitive and has more pitfalls than usual pitfall of division by zero.)
As a consequence of this simple math structure, it is very easy to build in hardware without any additional handling of nasty special cases.
> As a consequence of this simple math structure, it is very easy to build in hardware without any additional handling of nasty special cases.
Which is exactly my point! It's only the way we build things, not some hard-and-fast way things work intrinsically. I have no argument with it's an extremely simple way to deal with an over- or under- flow. But that's my point! It's a way to deal with the case, not the case itself!
> except for division, which is highly non-intuitive and has more pitfalls than usual pitfall of division by zero.
Division literally works the same way it does normally:
10 / 2 = 5 => 5 subtractions of 2 from 10 gives 0.
0: 10
1: 8
2: 6
3: 4
4: 2
5: 0
4 / 3 = 6 mod 7 => 6 subtractions of 3 from 4 gives 0
0: 4
1: 1
2: 5
3: 2
4: 6
5: 3
6: 0
I wouldn't say it's fraught with pitfalls, just that it's not common to see. FWIW, in an under- or overflow- situation we normally don't want modular arithmetic, hence why we still attempt to trap instead of ignore it.
I was referring to the non-prime modulo values, such as 2^32 or 2^64. In those cases, you do not only have "well-defined" and "division by zero", but also "division with multiple valid results". More generally, this is the issue of zero divisors. [1]
Modulo 2^n, you have proper division only if you divide by odd numbers, then you have division by zero, and finally you have division by any other even number, which is very different from what people used to. There may be no solution, or two solutions, or many more solutions.
Yes, could can still solve them by reducing the modulo, but that's the extra pitfall that you don't have with plain integers or rationals.
Same reason we talk about a "for loop" or a "while loop", instead of just a "conditional branch". Different words add different context to the concept that they're based on. For example, "modular arithmetic" doesn't include the context of checking for undesired leaks in Rust's arithmetic implementation. "Unchecked arithmetic" does.
Maybe it's an Apple thing. Sometimes they make up silly names like "Grand Central Dispatch" for executing a block in parallel in multiple threads. "ARC" makes me still think "adaptive replacement cache", not "automatic reference counting". At least they don't call it "automatic retain count". After all, they used to call "reference count" as "retain count". Confused the heck out of me on my first exposure to Objective-C.
Not that others like Microsoft don't do the same thing, renaming commonly known concepts to something they made up. Like DEP vs NX vs XD. Or Interlocked* for atomic operations. Atomic add in Microsoftese is InterlockedAdd.
Unchecked != modulo in Rust case, I think that the compiler is free to generate either a trap on overflow or a modulo result.
After all the MIPS has trap on overflow instructions..
> If you like to emphasize the fact that it's mostly modulo a power of two (2^n), then call it two's complement.
And it's signed, and negative numbers work by, well, two's complement...
Referring to unsigned integers, modulo a power of two, as "two's compliment" is at least weird.
Referring to integers with a different encoding of negative numbers as "two's complement" would be incorrect, even if they still wrap around near a power of two.
I started my career 35+ years ago programming in 8080 and then Z80 assembly language. A few years later I learned C and have been happily using it and C++ (I admit it, I use it as C plus classes) ever since. Because of my background I remember distinctly one of my first thoughts when encountering C was, "how do I access the overflow flag?" It seemed a huge problem. But I must admit I forgot all about it, (and have never had an overflow issue that I can remember), until recently when there suddenly seems a profusion of "C is a terrible language because of undefined behavior problems" posts, articles and discussions.
So I do wonder how strongly coupled overflow concerns are with most practical every day programming. In my favourite problem domain of chess, I use uint8_t to represent say the number of white pawns (8 is the upper limit), and uint32_t to represent the number of games in my database (10,000,000 is a practical limit). Hitting 4 billion games is about as much of a problem to me as hitting 256 pawns. And uint64_t is waiting if I ever need it.
Using integer types with massively excessive capacity seems a pretty obvious, easy and painless strategy to use for most everyday programming. Of course I realise that there are domains and situations in which this would be an exceptionally naive way to look at it - but I suspect they are at least somewhat exceptional.
Part of the difference is that 35 years ago the potential attack surface of any given piece of software was, generally, much lesser. Today it seems like everything is ubiquitously networked, and any given person's personal information is stored on god-only-knows how many servers under the purview of god-only-knows how many different people.
I agree that integer overflow is a relatively minor concern for things like locally-installed non-networked chess applications, but as soon as you give attackers both an attack surface and an incentive to attack, the equation changes somewhat.
C was always a terrible language for security - this is the language with 'gets' in its standard library - but it was years before it was fully recognized. Longevity of poor practice isn't a defense. It's our modern networked world that pulled back the rug on the horrors below.
For overflow in particular, it's because numbers - memory sizes, specifically - got bigger than can be represented in 32 bits. There's a class of attacks that depend on fooling a program into allocating much less than it thought it was allocating via overflow, and using that as a vector for exploitation. In the past, these programs simply wouldn't have been networked, or the numbers involved would have caused an OOM, but now the combination of both causes problems.
How big of a problem it is depends on the domain. I work with graphics where overflows happen all the time: you can find images large enough that width * height * constant doesn't fit in 31 bits. Image formats have 1-2 byte integers everywhere. You have to be careful with rounding and clamping pixels to avoid 255 white from flipping to 0 black, etc.
> in debug mode, arithmetic ... is checked for overflow
> in release mode, overflow is not checked
...
> "But what about performance?" I hear you ask. It is true that undefined behaviour drives optimisations by allowing the compiler to make assumptions
Are there any targets that have instructions like integer-add-but-trap-if-overflow? Or maybe a mode that can be switched? It would be interesting if there could be a high-performing path to leave the check in for release mode.
> Are there any targets that have instructions like integer-add-but-trap-if-overflow?
No "modern" targets. MIPS and Alpha could trap on overflow (on MIPS `add` would trap, `addu` would not, not sure about Alpha) but neither x86_64 nor ARM have hardware support for trapping on overflow.
x86 had INTO, which would INT 4 if the overflow flag was set, it still required additional instructions and the OS needed to convert that to an application trap, but there you are. It was removed in x86_64, likely because it wasn't used much.
In the examples the article this shows this will not help, as the overflow is in indirect addressing, not add instructions. I'm not a microcode expert but I guess arithmetic instructions, even without a jo, are not as efficient as indirect addressing, so we can't switch to arithmetic instructions with jo.
Perhaps we could have a new addressing mode that is the same as normal indirect but falls through and sets o if some part of it overflows. Then you'd do mov, jo which could then be fused. But I don't really know what I'm talking about.
> It was removed in x86_64, likely because it wasn't used much.
Well, to be honest (anecdotal data) in my career of programming (with lots of C++), I must say that I can't remember having any actual problems with integer overflow.
But for security (buffer overflow protection), it could be useful.
When doing integer math with medium sized numbers on systems without floating point hardware, I have to be careful to arrange operations so they do not overflow, and sometimes it's still necessary to give up and use a 64-bit type on a 32-bit CPU.
An example calculation would be projecting a 3D coordinate into a 2D space.
Rust already has bounds checking on buffers though, so that specific issue isn't a huge concern. However, there is a class of security bugs introduced by bad logic, its just a little more difficult to exploit.
In my career of programming (doing a lot of x86 programming in the late 80s/early 90s) I never saw INTO used, and on modern systems, it's very slow and worse yet, at least under Linux, it's mapped to a SIGSEGV (http://boston.conman.org/2015/09/05.2).
If you don't deliberately send an overflow through every function that does any arithmetic, they could quietly give wrong answers sometimes and I don't see how you'd know that.
It is going to be less common in many fields but I can imagine 64-bit overflow is a concern in a fair few scientific number crunching situations.
Also, if you were to use the scaled integer method of representing fractional numbers (like the fixed-point data types seen in many SQL implementations) with a scale factor of 2^32 then a 64-bit overflow is a 32-bit overflow effectively (though I'll admit that is going to be no less rare than genuinely needing the full range of 64-bit integers).
And I'm guessing all this applies to underflow as much as it is does to overflow which could be a concern for unsigned integers of any size.
Do you know if there is any interest to re-introduce the feature? It'd certainly be nice to be able to eliminate overflow bugs without runtime overhead.
One problem I have from a mathematical standpoint with dealing with overflow is often you may want the check to apply to an entire expression rather than individual operations. For instance, the expression "x + y - z" may cause an overflow when adding x and y, but may be "recovered" when subtracting z. We could change the expression to "x + (y - z)" but now we have to resign ourselves to the fact that addition is no longer associative, and requires the programmer to know which operations should be grouped together. Is there a way to do bounds checking of some sort on these expressions to ensure there is no overflow in a decidable way?
Yeah, it can definitely be annoying when an expression as a whole doesn't overflow, but subexpressions do... however, chains of addition and subtraction are special in this regard, even something like (x + y) / 2 (from binary search) is problematic. Of course, chains of + and - like that do pop up, so maybe an analysis of them would be valuable. In any case, that sort of recovery is a major reason that overflow is defined to wrap when checking is off: errors may cancel out.
And x + (y - z) doesn't even fix the problem. y-z can still underflow, while (x+y) - z wouldn't. Being able to do it on the expression overall would be ideal.
Yeah, I'm wondering how that would look at the assembly level. In the "x + y - z" example we get an overflow from the "x + y" and an underflow when we add z. So you'd have to check if the number of overflows and the number of underflows are equal, and the assembly would probably be really slow. You'd want some kind of bounds check optimization to try and remove as many of those checks as you can.
Fortunately, I don't think these issues will appear too often (at least they haven't in my experience), and you can probably get away with just adding unit tests. To be completely thorough you should probably use some sort of formal modeling and proof system, which I would like to see someone try and develop for Rust.
I don't understand their logic. If "for X in" gets rid of many cases where overflow checking might hurt performance, what is the argument for disabling it (by default) in release builds, in a language that aims to be safe? Do they think it is that important to win benchmarks run with default compiler flags?
Also, I expect Rust will still do bounds checking, and would guess many of the cases where the release build removes checks for overflow will now require some extra out of bounds checks: those for negative indices.
There are different levels of safety. Rust does not compromise on memory safety. Rust does compromise on integer overflow safety. This is not to say that memory safety is all that matters: rather it's to say that the line has to be drawn somewhere, and memory safety is where it was drawn.
> I expect Rust will still do bounds checking, and would guess many of the cases where the release build removes checks for overflow will now require some extra out of bounds checks: those for negative indices.
No, for two reasons: (1) you can only index arrays by unsigned values in Rust; (2) twos complement means that at the CPU level you can check for both negative indices and positive out of bounds indices by simply using one unsigned compare instruction.
The article talks about three different levels of optimization.
- Detect and trap overflows.
- Assume that two's complement math is a thing, and overflow can lead to additions producing negative values.
- Assume that overflow never happens, and optimize on this basis.
Rust does #1 in debug mode, #2 in release mode except when explicitly told not to. C compilers do #3. #3 is what leads to developers screaming at their compilers.
Arithmetic still happens a lot in programs, and especially so in tight loops (most tight loops will be filled with arithmetic, rather than anything else), even if it doesn't happen as the loop counters.
You are correct that Rust does bounds checks in release mode, but fortunately there's no extra bounds checks needed:
- all indexing is done with unsigned values, and,
- even if signed values were used, the '< 0' comparison is still not necessary because one can first cast to unsigned (a noop) and then do a single '< length' comparison: negative values will be huge, at least as large as MEMORY_SPACE/2, which is in fact the limit on array lengths in Rust[1].
If you allow overflow in your module, and then another module uses yours but disallows overflow, what happens? Could the overflow behavior be encoded in an integer's type to make this kind of code mixing explicit (via casts)?
This is all hypothetical (in Rust), because there's no implementation. However, I would think that per-module enabling would be sticky, disregarding any call/use-sites: a module defining a hash function would say that overflow is OK/expected, and it would be incorrect for someone else to externally decide it is an error. (This also seems to be the only thing that is consistent: what if it is used from two modules with different options?)
What I mean is, suppose your module does an arithmetic operation involving a _function from the other module like "x + other_module_function(y) * 2". You've enabled panicking on overflow in your module for safety. Is it still possible that "other_module_function(y)" will return a result that overflowed without panicking? Does this mean that anyone writing code with overflow disallowed must carefully choose what modules they call to avoid calling any that allow overflow?
Will it be possible to set some flag in Cargo.toml or in compiler arguments to enforce "saturating" behavior everywhere is possible? (even by price of some performance degradation).
Why would you want that? "saturating" behavior is something you want only rarely. It certainly isn't something you want to use by default for arithmetic operations.
>"saturating" behavior is something you want only rarely.
Well, yes, because what you almost always actually want is the mathematically accurate result. Failing that, though, I can think of a few cases where you would want overflow to saturate (hardware outputs, mostly); I haven't thought of any cases yet where wrapping would be remotely useful.
Literally the definition of the integers is that (x + 1) != x. Wrapping is bad enough, but at least tends to makes failures noticeable. Saturating arithmetic makes it really hard to see that a failure has occurred.
Depending on the performance penalty you're willing to take, there's good alternatives to wrapping, such as: A) 64-bit integers, B) bigints, C) crashing on overflow.
Sure, saturation might be useful in certain circumstances, like manipulating audio samples, but it's generally something you want to make explicit. Using it everywhere? It's madness!
Say you're computing a buffer size with untrusted inputs.
With saturated arithmetic, you could add the (unsigned) sizes together without a possibility of an overflow, so you could eliminate all except the last range check (=branch).
If the end result is larger than what is reasonable, return an error. It's not possible that the value wrapped around and later code will be writing to unallocated memory.
That doesn't actually eliminate range checks. Hardware doesn't have native saturating overflow operations so the saturating overflow methods are implemented by doing a checked overflow and using the min/max value of the type if it overflows/underflows. Which is to say, it removes them from your code, but there's no performance benefit to it.
They do? I admit I'm not an expert on either x86 or ARM assembly, but I've never seen code that used a saturating arithmetic instruction, and the saturating math in Rust is implemented using checked arithmetic as opposed to any kind of llvm intrinsic for saturating math. Looking at the LLVM docs I don't see any kind of intrinsic for saturating math either (checked math uses llvm.sadd.with.overflow.* and friends).
How easy it is to decide for others if they are mad or not, even not asking them about details... No worries, I'm not proposing it as default for all, I was just asking if it's possible to set as flag for some programms.
Because I'm sure it's the behavior I want to use 99% of the time, and rest 1% is same behavior just with some warning, not breaking execution flow. I'm not asking it to be default, just flag. Thanks for moving discussion out of my question...
It sounds to me like you probably misunderstand what "saturating" arithmetic is. It certainly is not the behavior you want for 99% of your arithmetical code. It is really quite rare to actually want that behavior.
"The current state isn’t necessarily the final state of overflow checking: the RFC even mentioned some future directions. Rust could introduce operators like Swift’s wrapping +% in future, something that was not done initially because Rust tries to be conservative and reasonably minimal, as well as hypothetically having scoped disabling of overflow checking (e.g. a single function could be explicitly marked, and its internals would thus be unchecked in all modes). There’s interest in the latter particularly, from some of Rust’s keenest (potential) users Servo and Gecko."
There are already methods for all intents, and while (as noted in TFA) it's possible wrapping gets an operator, it's really unlikely that saturating or checked would get one.
The article already mentioned the idea of using separate operators to control this behavior, similarly to how Swift has a separate set of operators for wrapping arithmetic.
There's no flag to enforce saturation, although as discussed in the article and above, saturation isn't as nice as wrapping in several ways (in particular, errors do not cancel out like they do with wrapping).
There are some architectures that support saturating arithmetic in hardware, DSPs for example. I wonder if the Rust saturating arithmetic commands get turned into those automatically?
In practice, for anything which isn't a micro-benchmark, it's a negligible performance hit. For things resembling a micro-benchmark, you can specifically use wrapped-arithmetic operations, or disable it for that module.
I see a strong correlation between people complaining about overflow checking overhead, and people who don't actually care about the safety features.