
Myths about Integer Overflow in Rust - dbaupp
http://huonw.github.io/blog/2016/04/myths-and-legends-about-integer-overflow-in-rust/
======
pslam
I very much believe overflow checking should be on for release builds, and not
just debug. Overflow is defined by the language as an illegal operation, but
if it's only enabled by debug builds, then it's effectively defined as
"somewhat" illegal. I think this gives the community the wrong impression, and
a firmer stance should be taken. This is a killer feature, in my opinion.

In practice, for anything which isn't a micro-benchmark, it's a negligible
performance hit. For things resembling a micro-benchmark, you can specifically
use wrapped-arithmetic operations, or disable it for that module.

I see a strong correlation between people complaining about overflow checking
overhead, and people who don't actually care about the safety features.

~~~
pcwalton
> In practice, for anything which isn't a micro-benchmark, it's a negligible
> performance hit.

Citation needed? Every paper I've seen has indicated a non-negligible
performance hit in real apps.

> I see a strong correlation between people complaining about overflow
> checking overhead, and people who don't actually care about the safety
> features

I'm a counterexample.

~~~
vardump
It would be interesting to have support for saturated types as well.
u8saturated 200 + u8saturated 200 = 255. That's often desired behavior.

Saturated arithmetic is supported in hardware in SSE. Although not much point
if the values need to be constantly shuffled between SSE and scalar
registers...

~~~
Manishearth
.saturating_add() exists. You can create a wrapper type around this and
implement the Add/Sub/Mul traits.

~~~
vardump
Does it have branchless codegen?

~~~
Manishearth
Not right now (I'm surprised that's the case, there should be intrinsics for
it), but that could change.

------
billforsternz
I started my career 35+ years ago programming in 8080 and then Z80 assembly
language. A few years later I learned C and have been happily using it and C++
(I admit it, I use it as C plus classes) ever since. Because of my background
I remember distinctly one of my first thoughts when encountering C was, "how
do I access the overflow flag?" It seemed a huge problem. But I must admit I
forgot all about it, (and have never had an overflow issue that I can
remember), until recently when there suddenly seems a profusion of "C is a
terrible language because of undefined behavior problems" posts, articles and
discussions.

So I do wonder how strongly coupled overflow concerns are with most practical
every day programming. In my favourite problem domain of chess, I use uint8_t
to represent say the number of white pawns (8 is the upper limit), and
uint32_t to represent the number of games in my database (10,000,000 is a
practical limit). Hitting 4 billion games is about as much of a problem to me
as hitting 256 pawns. And uint64_t is waiting if I ever need it.

Using integer types with massively excessive capacity seems a pretty obvious,
easy and painless strategy to use for most everyday programming. Of course I
realise that there are domains and situations in which this would be an
exceptionally naive way to look at it - but I suspect they are at least
somewhat exceptional.

~~~
barrkel
C was always a terrible language for security - this is the language with
'gets' in its standard library - but it was years before it was fully
recognized. Longevity of poor practice isn't a defense. It's our modern
networked world that pulled back the rug on the horrors below.

For overflow in particular, it's because numbers - memory sizes, specifically
- got bigger than can be represented in 32 bits. There's a class of attacks
that depend on fooling a program into allocating much less than it thought it
was allocating via overflow, and using that as a vector for exploitation. In
the past, these programs simply wouldn't have been networked, or the numbers
involved would have caused an OOM, but now the combination of both causes
problems.

~~~
pjmlp
The sad thing is that in 1961 we already had operating systems and programming
languages that took security seriously.

[https://en.wikipedia.org/wiki/Burroughs_MCP](https://en.wikipedia.org/wiki/Burroughs_MCP)

[https://en.wikipedia.org/wiki/NEWP](https://en.wikipedia.org/wiki/NEWP)

------
wyldfire
This is a great writeup, thanks!

> in debug mode, arithmetic ... is checked for overflow

> in release mode, overflow is not checked

...

> "But what about performance?" I hear you ask. It is true that undefined
> behaviour drives optimisations by allowing the compiler to make assumptions

Are there any targets that have instructions like integer-add-but-trap-if-
overflow? Or maybe a mode that can be switched? It would be interesting if
there could be a high-performing path to leave the check in for release mode.

~~~
masklinn
> Are there any targets that have instructions like integer-add-but-trap-if-
> overflow?

No "modern" targets. MIPS and Alpha could trap on overflow (on MIPS `add`
would trap, `addu` would not, not sure about Alpha) but neither x86_64 nor ARM
have hardware support for trapping on overflow.

x86 had INTO, which would INT 4 if the overflow flag was set, it still
required additional instructions and the OS needed to convert that to an
application trap, but there you are. It was removed in x86_64, likely because
it wasn't used much.

~~~
amelius
> It was removed in x86_64, likely because it wasn't used much.

Well, to be honest (anecdotal data) in my career of programming (with lots of
C++), I must say that I can't remember having any actual problems with integer
overflow.

But for security (buffer overflow protection), it could be useful.

~~~
nitrogen
When doing integer math with medium sized numbers on systems without floating
point hardware, I have to be careful to arrange operations so they do not
overflow, and sometimes it's still necessary to give up and use a 64-bit type
on a 32-bit CPU.

An example calculation would be projecting a 3D coordinate into a 2D space.

------
konne88
What we really need is a hardware extension for ARM/Intel that traps on
under/overflow:
[http://blog.regehr.org/archives/1154](http://blog.regehr.org/archives/1154)

~~~
snuxoll
The overflow flag does a fine job of this, IMO.

~~~
kazagistar
The overflow flag requires a operation and branch to check. A trap is "free"
until it actually happens.

------
openasocket
One problem I have from a mathematical standpoint with dealing with overflow
is often you may want the check to apply to an entire expression rather than
individual operations. For instance, the expression "x + y - z" may cause an
overflow when adding x and y, but may be "recovered" when subtracting z. We
could change the expression to "x + (y - z)" but now we have to resign
ourselves to the fact that addition is no longer associative, and requires the
programmer to know which operations should be grouped together. Is there a way
to do bounds checking of some sort on these expressions to ensure there is no
overflow in a decidable way?

~~~
simcop2387
And x + (y - z) doesn't even fix the problem. y-z can still underflow, while
(x+y) - z wouldn't. Being able to do it on the expression overall would be
ideal.

~~~
openasocket
Yeah, I'm wondering how that would look at the assembly level. In the "x + y -
z" example we get an overflow from the "x + y" and an underflow when we add z.
So you'd have to check if the number of overflows and the number of underflows
are equal, and the assembly would probably be really slow. You'd want some
kind of bounds check optimization to try and remove as many of those checks as
you can.

Fortunately, I don't think these issues will appear too often (at least they
haven't in my experience), and you can probably get away with just adding unit
tests. To be completely thorough you should probably use some sort of formal
modeling and proof system, which I would like to see someone try and develop
for Rust.

------
Someone
I don't understand their logic. If "for X in" gets rid of many cases where
overflow checking might hurt performance, what is the argument for disabling
it (by default) in release builds, in a language that aims to be safe? Do they
think it is that important to win benchmarks run with default compiler flags?

Also, I expect Rust will still do bounds checking, and would guess many of the
cases where the release build removes checks for overflow will now require
some extra out of bounds checks: those for negative indices.

~~~
pcwalton
> in a language that aims to be safe

There are different levels of safety. Rust does not compromise on memory
safety. Rust does compromise on integer overflow safety. This is not to say
that memory safety is all that matters: rather it's to say that the line has
to be drawn somewhere, and memory safety is where it was drawn.

> I expect Rust will still do bounds checking, and would guess many of the
> cases where the release build removes checks for overflow will now require
> some extra out of bounds checks: those for negative indices.

No, for two reasons: (1) you can only index arrays by unsigned values in Rust;
(2) twos complement means that at the CPU level you can check for both
negative indices and positive out of bounds indices by simply using one
unsigned compare instruction.

------
rcthompson
If you allow overflow in your module, and then another module uses yours but
disallows overflow, what happens? Could the overflow behavior be encoded in an
integer's type to make this kind of code mixing explicit (via casts)?

~~~
wyldfire
The code generated for each will be different.

~~~
rcthompson
What I mean is, suppose your module does an arithmetic operation involving a
_function from the other module like "x + other_module_function(y) * 2".
You've enabled panicking on overflow in your module for safety. Is it still
possible that "other_module_function(y)" will return a result that overflowed
without panicking? Does this mean that anyone writing code with overflow
disallowed must carefully choose what modules they call to avoid calling any
that allow overflow?

------
EugeneOZ
Will it be possible to set some flag in Cargo.toml or in compiler arguments to
enforce "saturating" behavior everywhere is possible? (even by price of some
performance degradation).

~~~
eridius
Why would you want that? "saturating" behavior is something you want only
rarely. It certainly isn't something you want to use by default for arithmetic
operations.

~~~
zzzcpan
How about just different operators for different intents? Like prepending a
special symbol in front of an operator to make it behave differently.

~~~
pcwalton
That's discussed in the article.

"The current state isn’t necessarily the final state of overflow checking: the
RFC even mentioned some future directions. Rust could introduce operators like
Swift’s wrapping +% in future, something that was not done initially because
Rust tries to be conservative and reasonably minimal, as well as
hypothetically having scoped disabling of overflow checking (e.g. a single
function could be explicitly marked, and its internals would thus be unchecked
in all modes). There’s interest in the latter particularly, from some of
Rust’s keenest (potential) users Servo and Gecko."

~~~
zzzcpan
Yes, I know. But more operators is the solution, it removes all that ambiguity
with multiple intents and simplifies a lot of things.

------
Symmetry
There are some architectures that support saturating arithmetic in hardware,
DSPs for example. I wonder if the Rust saturating arithmetic commands get
turned into those automatically?

~~~
vardump
And x86 (SSE).

[http://felix.abecassis.me/2011/10/sse-saturation-
arithmetic/](http://felix.abecassis.me/2011/10/sse-saturation-arithmetic/)

------
eximius
It'd be nice if there were a function annotation for this.

------
nvus
Before reading the article I just have to say that I love the design of your
page.

