I wish the author had dug a little deeper to see why this slowdown occurs.
GCC seems to be generating an out-of-line function call to implement the trapped addition, which is going to inhibit a lot of other optimizations like vectorization:
In C++, this is going to be way more useful than -ftrapv anyway, since in a lot of situations you're dealing with user input and you're probably going to want to throw, not abort.
> Looking at the assembly, I find that the clang compiler generates sensible code on x64 processor, with simple jumps added when the overflow is detected. Meanwhile, GCC seems to call poorly optimized runtime library functions.
Indeed. -ftrapv is no longer the best choice for overflow trapping in GCC anyway; in my experience it's pretty inconsistent handling e.g. compile-time overflows.
-fsanitize=signed-integer-overflow (part of UBSAN) produces more efficient code, provides better coverage, and is more flexible. By default it provides a diagnostic message; you can instead simply abort with -fno-sanitize-recover, or execute an undefined instruction with -fsanitize-undefined-trap-on-error (equivalent to Clang's -ftrapv). Regardless of choice, addition is performed inline.
It's worth mentioning that x86 CPUs had a single instruction overflow test that would raise a hardware interrupt in response to an integer overflow. This allowed for extremely low cost integer overflow tests.
Unfortunately, this capability was deleted from x86-64 and we're now stuck with either accepting the risks of integer overflows or suffering a performance hit to test for them by using multiple instructions.
Another extremely useful instruction deleted from x86-64 tested if an array access was in bounds. Determining how many buffer overflow exploits could have been prevented if this instruction was widely used--and carried over to x64--is left as an exercise for the reader.
It's still a single instruction in the hot path: jo.
All such overflow jumps could go to the same handler, in which case it really would just be one instruction overall, but you'd lose the ability to report the exact location of the overflow. A small per-overflow thunk would solve that (it could be as small as a single call instruction).
Currently the jo doesn't macro-fuse with the addition on Intel chip, but maybe that will change in the future.
I don’t know x86, but, after googling, I would guess that trapping instruction was INTO.
Reading https://xem.github.io/minix86/manual/intel-x86-and-64-manual..., that would give the handler enough information to figure out where the overflow happened. Of course, that would be at the cost of some performance (setting those registers takes time)
Yes, one advantage of INTO over JO is that the interrupt mechanism insures that you know where you came from, so you can build location reporting on top of that.
However, I would consider that advantage fairly small, and it's not easy to implement either in a modular way since the way interrupts or signals are handled is a shared global thing, so you have to have process-wide agreement on how this overflow trapping is going to work (or at least an agreement on how to disagree). In a shared runtime language where everything plays by the same rules, but in low level native langues like C, C++ and Rust it is more difficult.
A jump based approach has the advantage that is is standalone and really doesn't seem much more expensive than INTO. The single jump instruction, and a bit more cold, out of line, code if you want accurate location reporting.
If/when AMD and Intel start fusing JO with the preceding arithmetic operation, it will be have a non-trivial perf advantage over INTO.
It's also IMHO a failure of RISC-V. "We did not include special instruction set support for overflow checks on integer arithmetic operations in the base instruction set, as many overflow checks can be cheaply implemented using RISC-V branches"
I prefer branching. Interrupt is inconvenient especially if you mean not panic and die but do some recovery. Conditional jump after arithmetic operation could be as cheap as one instruction, branch prediction could make this instruction to cost almost nothing in terms of execution times (at least if overflows are a rare thing), and you could point this branch anywhere you like, so you could process overflow and try to recover from error without going outside of a current stack frame.
It could be ugly in a high-level languages though. Instead of (a+b)*c you might get something like
No, sometimes overflow is so rare the trap saves you time in the majority case. Also sometimes panicking on hardware trap is the useful thing.
If we had more ALU variety these discussions wouldn't pop up. Saturating arithmetic, bounded arithmetic, etc are all useful and safer by default in most cases.
The question is how much more expensive then a overflow checked add, which is mostly likely not as cheap as a non checked add.
I mean in the end both do a non checked add and then branch on the carry. One does it implicitly and branches to an interrupt and one does it explicitly and branches however you like it.
(Slightly) Increased chip complexity should be included in the choice, too.
I guess the main difference would/could be if used with some form of branch prediction.
It signals int 0x04 if the instruction is executed while the overflow flag is set due to a prior signed arithmetic operation wrapping.
It seems I misspoke somewhat; there is somewhat equivalent functionality in x64 in the JO instructions, but they're six bytes rather than one and the extra size could have cache and/or memory bandwidth impacts under demanding circumstances.
Yeah I knew about jo, and I think add-jo is even fused, but without a trap you have to write code everywhere to handle the overflow which you wouldn’t need if you had a trap. And of course jo doesn’t work for address arithmetic. That’s where the real overhead is.
True but do you care for all sums/muls in your code?
Might be better to check where exactly can you have the overflows and test in the code for those cases. I think checking for overflows over the entire code is useful for testing/validation purposes but not really useful in production.
Or use bigger data types if you're hitting overflows frequently.
BOUND is pretty slow, and requires an odd start/end pair to be placed in memory. I don't see any reason that it would be better than the usual unsigned comparison + branch that languages with bounds checking tend to use.
Besides, much of the difficulty with memory safety in C and C++ is the existence of pointers (or wrappers around them, like iterators), which do not come with an associated length. Length checking machinery can't help with that problem whether special instructions exist or not.
In short, it's doubtful that the availability of these old instructions would make any difference to the practicality of bounds checking on modern x86 machines whatsoever.
INTO would still need to check the flags and you would need to invoke it after every operation that could overflow. You can't configure the integer ALU to raise an interrupt automatically after any overflow.
Is there a reason to not add a new instruction to x86-64 which does interrupt the program on an integer operation overflow so no branching happens in the program? And can this instruction be made just as fast as the unchecked ones?
I know nothing about micro-architectural implementations, but as far as I can remember, all instructions that can trap (load/stores, divs, etc) take more than one clock cycle. Simple sub/adds usually have 1 cycle latency; there just might not be enough latency budget to fit the trapping machinery.
I guess that the trapping doesn't actually need happen in the critical path (i.e. the results might be available earlier) but this would mean that even ALU instructions cause speculation.
That wouldn't work well for C since signed overflow is undefined, but unsigned overflow is well-defined, with wrapping semantics. Making all existing arithmetic instructions trapping would mean paying trapping costs for well-defined unsigned overflow.
I think this would be a per-thread state which you toggle, defaulting to "off". Then instead of checking and branching after each operation you flip it on/off depending on the type of arithmetic you want. The trick would be to get to the minimal number of toggles and not do the "safe" thing and set it to on/off for each operation.
Generally this sort of global states are considered a misfeature. They are hard to use and impede cpu design (they are particularly bad for OoO). It is better to have per instruction flags.
For exemple while SSE still has an fpu control word, explicit instructions were added for some stuff that historically was handled by the control register (rounding for example).
Aborting the program on integer overflow seems so drastic. Say you do that in a server program. Then convincing it to overflow an integer somewhere (possibly somewhere where it is harmless) causes denial of service and all unrelated connections in the process die?
I guess languages with support for exceptions have a more natural action for this, that could be caught somewhere in a server's processing loop to not affect other clients.
This is essentially Swift’s problem on the server. If you have a service running and serving hundreds of clients, then any one of those threads can cause a trivial panic if there’s an overflow and take the entire service down.
As a result, a number of use cases for swift on the server actually use heavyweight out of process threads (i.e. one process per client) so that the failure of one of them doesn’t take down the rest.
Unfortunately there’s no way of overriding the panic (is implemented as an in-line ud2 instruction rather than an overridable function) and while you can install a signal handler to catch it, resuming isn’t straightforward and there are some cases where you really can’t do anything about it (malloc failure).
This works great for single user devices where there’s only one app running at once and when it crashes the user restarts it, but doesn’t work in the server case unless you are using a cgi-bin/lambda pattern when each request is processed by a new process each time, which is what TF and the AWS lambda approaches do.
This reminds me of Rust unconditionally ignoring SIGPIPE. Except in this case there's a non-trivial amount of pre-existing (and even future) code that assumes a process will be terminated if it attempts to write to a closed pipe. One of Rust's selling points is ease of FFI, so it's largely irrelevant that Rust APIs wouldn't let you ignore write errors. Rust breaks the environment for libraries that rely on the behavior.
It's a bad library design to rely on a process-global setting, because one day you'll have two libraries in your project that rely on opposite values of such a global setting, and then you'll start to see very funny things.
And the library design that relies on an application it's used in being killed by the OS so that the library don't have to tidy things up after itself or to properly handle error conditions is really bad. Don't you just like when one of your random indirect dependency, thrice-removed, decides to call std::abort() just because it's tired of living?
I agree libraries shouldn't rely on process-global state. It's also bad library design to trigger signed overflow. In my libraries I try to handle both situations cleanly, but that's all beside the point.
There's also the problem that SIG_IGN handlers are inherited across exec. So Rust programs that exec other programs without restoring the default handler could cause problems in programs where relying on SIGPIPE is more reasonable. This is also true of posix_spawn: SIG_IGN handlers aren't implicitly reset.
Note that Rust doesn't restore any other handlers to their default disposition as far as I can tell. Do you write all your programs to reset signal disposition on startup? I'll admit I don't normally do that unless it's a daemon.
If I were to write a program that relied on "writing to a file might terminate me and that's exactly what I want" behaviour, then yeah, I absolutely would include a call to "signal(SIGPIPE, SIG_DFL)" in the main().
Of course, if I were to actually write a filter-like program (a la cat, head, grep, etc.), I would rather have just checked the return values of my write()s. It's -1? Time to call exit(errno).
One of the post hoc rationales that seems to have cropped up is as you mentioned. But that logic only makes sense if you expect platform-specific code to be rare. I doubt that's the case: from my limited experience there seems to be quite a lot of Rust code written specifically for the Unix environment. Rust is a "systems" language after all; writing platform-specific applications, not simply platform-specific modules or wrappers, is to be expected.
It's a legitimate glitch in Rust. It happens. I don't understand the urge for people to defend Rust here. It violates well accepted norms in the world of systems programming--don't f'up the global environment without being asked. Indeed, part of the reason green threading failed was because it required too much magic interposed between the application and the native platform environment. Rust wasn't trying to go the route of Go.
I’m curious if one could use a mildly-modified Swift compiler that converts all the different ways to abort execution into some sort of panic handler you can override would help work around this issue.
As someone who doesn't know Swift, is there a reason it doesn't handle these errors by throwing exceptions? That approach works well for Java and .Net.
There is a compiler option to disable overflow/underflow checks, so unless the overflow is in some library outside your control, it's not actually an issue.
> Aborting the program on integer overflow seems so drastic.
Disagree. Overflowing of a signed integer type is undefined behaviour, which deserves to be taken seriously.
I was surprised to see that the article doesn't mention the signed/unsigned distinction, or undefined behaviour. Overflowing of a unsigned integer type is not a problem, it's defined to wrap around, and may be done intentionally.
I guess people overreact to "undefined behavior". If your software runs always on the same platform then you can know what the behavior would be. It might also be intentional and should not crash your program.
Undefined behavior is used when doing a specification for a compiler that will be used in many architectures and they cannot clearly specify what would happen on ALL architectures for a given operation, because the result might change among these architectures.
Undefined behavior doesn't mean that your CPU has either a chance to suddenly blow away or that it would gain self-awareness and destroy the world.
You can actually predict what would happen if you throw away a rock on earth. You cannot predict it for every planet in the universe, so theoretically "throwing a rock" produces undefined behavior.
> If your software runs always on the same platform then you can know what the behavior would be.
This is not necessarily true, and I consider it to be a harmful misconception.
I think people under-react to "undefined behavior".
What you think the platform would do if you overflowed an integer using machine instructions and what your C or C++ code ends up executing can be wildly different things because C and C++ compilers do everything in their power to optimize your code assuming your signed integers will never overflow. If they do, who knows what state your program is in or which machine instructions the compiler would have scheduled.
Fully understanding what a program might do if there is undefined behavior is not trivial. It requires basically reviewing and fully understanding the output assembler for every single build.
Any change in system headers, compiler version, compiler flags, or source code might invalidate previous reviews.
And programs with lurking undefined behavior may not seem dangerous but sufficiently clever exploits can do quite amazing things with the smallest of attack surfaces.
Sure but in 17 years writing C for a living, with lines of code by the millions, and (embedded) devices running my code all over the world I still have to know about an integer overflow bug.
Sure, I don't always write bug-free code (as anyone), but when I say it's not so terrible I mean that UD should be considered a rare case, and it's not worth even a single night of quiet sleep, let alone switching languages or overreacting. Hey, I am more afraid of a flipping bit due to power supply noises.
And I don't want to say either that my code is perfect, or since it never happened to me it will never happen to you and that you can go around triggering UD everywhere since it's not a big deal. But I also like to warn young players that "boooo UD!" is just part of a FUD tactic for them to adopt some shiny new language.
There is a ton of software out there that is riddled with bugs. Much of it is run behind closed doors, isn't exposed to attack vectors like the internet, and happily churns along because no one is trying to attack it.
Lots of other software might be exposed to the internet or to malicious users but no one has stumbled across it and tried to exploit it yet.
Code like that may be ok for now simply because of the circumstances, but I don't think that's a pass to ignore UB or to discourage folks from trying to improve the status quo.
Because some day either the developer of that code, who is used to ignoring this kind of thing, might write some code that has to be more correct or it will get exploited. Or the code itself might get changed, or exposed to attackers, or copied into some other program.
These things happen and they are preventable. And you don't even have to switch languages - turning on sanitizers when you compile and fixing issues as they are found, and encouraging others to do the same, is better (in my opinion) than categorizing folks worrying about and working on this issue - in C and in other languages - as FUD spreaders.
There are worse things than can happen with bad wrote code other than security bugs. It doesn’t have to necessarily be exposed to the internet or users/attackers for bad things to happen. I truly believe that people just forgot this little fact.
Again, I don’t discourage people from preventing UD, but, again, it’s not a big deal. Especially is not something to be afraid of.
My code is as much as correct, verified, analyzed and tested as possible, and most of the times compliant with the most absurd certifications. Again, I still have to find a bug or fail a test because some strange UD.
Disagree. UB in C can mean your program silently going off the rails in a way that isn't possible in many other languages. UB really can manifest in bizarre and scary ways on real platforms. [0][1][2][3] If it isn't manifesting that way in your case, that's just good fortune.
> I still have to find a bug or fail a test because some strange UD
I should hope your safety-critical code is not permitted to contain known instances of unintended undefined behaviour. (There can be exceptions where UB is truly intended. JIT compilation always relies on UB, for instance, not that this would apply to safety-critical code.)
If your program contains undefined behaviour, that means that at best, it works by coincidence. Safety-critical code should not work by coincidence, it should be correct by construction (to steal a term from the formal methods world).
Again, I am not saying "go ahead and don't care about UB". I know what it is; I know what it can do. Just don't make it a big deal. You (plural) make it sound like it's impossible to build a system with C or any other laguange that allows UB, because it probably runs by coincidence. Yet the world keeps spinning and super safe flashy languages are still running on top of UB-illed OS wrote in C.
> I should hope your safety-critical code is not permitted to contain known instances of unintended undefined behaviour.
It's not.
> Also, it's UB, not UD.
Now that you mention it I checked my other comments I wrote from my phone and noticed it changed UB with UD.
I don't think we are claiming what you say we are claiming.
I know I'm only responding because of your claim that you can know what your code does even in the face of undefined behavior.
I think this more accurately describes my position:
Code without undefined behavior can absolutely run just fine.
Code with undefined behavior is a problem waiting to happen.
Code with undefined behavior that happens to run the way you want is a happy accident. The longer it runs correctly the luckier you are.
Code in safety systems should not contain undefined behavior.
I would encourage everyone to be aware of undefined behavior, get rid of it when they know of it, and take steps to be proactive against it. For code in safety systems I would require these things if it were up to me.
The point is not to fear undefined behavior (though if my software was used in safety systems, I would fear undefined behavior).
The point is to eliminate it because it can cause problems in even the most well-tested code, either when the code inevitably operates outside the test parameters or when compilers or compile flags or system headers change.
Eliminate it both reactively and protectively. Not only in your own code, but encourage others to do the same.
As jjnoakes and fbkr already commented, this account of undefined behaviour is not correct. You've conflated it with unspecified behavior and implementation-defined behavior, and they're not the same beast. [0]
Undefined behaviour does not mean that the behaviour simply depends on your platform and is otherwise harmless. That's a widespread misconception, and a rather harmful one. You may see that a signed integer overflow appears to wrap harmlessly on your platform, but the compiler is under no obligation to consistently handle signed integer overflow in this way, unless its own documentation makes this guarantee.
> people overreact to "undefined behavior"
I have to agree with jjnoakes that people more commonly underreact to it.
> If your software runs always on the same platform then you can know what the behavior would be.
You cannot infer what non-standard guarantees your compiler is committed to. Unless the compiler guarantees to handle UB in a certain way, it is free to surprise you at any time.
> It might also be intentional
As mentioned above, intentional UB only makes sense if the compiler guarantees to give you the expected behaviour. If that's not the case, it's a bug.
Even then, it's probably best to write portable standards-compliant code.
Either way I'd hope to see a static-assert checking that the expected compiler is being used, loudly breaking the build if this check fails.
> should not crash your program.
This isn't the case. Dereferencing NULL may cause a segfault, terminating your process. Dividing by zero might cause a similar kind of explosion. Depending on platform, freeing a pointer twice might also immediately crash the program (the most helpful way for a platform to handle a double-free).
> Undefined behavior is used when doing a specification for a compiler that will be used in many architectures and they cannot clearly specify what would happen on ALL architectures for a given operation, because the result might change among these architectures.
Not always. The C rule about data-races being undefined behaviour, is intended to enable compiler optimisation, by permitting compilers to assume the absence of data-races. This rule would make sense even if C weren't intended to be portable across different hardware architectures.
> Undefined behavior doesn't mean that your CPU has either a chance to suddenly blow away
In principle, undefined behaviour could result in hardware damage, but this seems pretty unlikely unless you're writing power-management code for a motherboard, or something like that. Of course, if you're writing code for a medical system, industrial-control system, or avionics, then undefined behaviour might have terrible consequences.
Self-awareness would be permitted by the C standard, but, again, rather unlikely.
> You can actually predict what would happen if you throw away a rock on earth. You cannot predict it for every planet in the universe, so theoretically "throwing a rock" produces undefined behavior.
I don't see much value in this analogy. It's possible to build a fully deterministic programming language, with no undefined behaviour, and no platform-specific variation. C isn't designed that way, but this is a property of C, not a general fact of computation. It's also possible to write C code entirely devoid of undefined behaviour.
You’re you’re talking about the behaviour in C. Other langages are not C and can and do define other behaviours.
Overflow is neither wrapping nor UB in Swift, it’s an error. Overflow is never UB in Rust but it might be wrapping or a panic. Overflow doesn’t exist in Python or Erlang. I think overflow is IB in Haskell.
You get the point: that signed overflow is UB and unsigned wraps in C has no bearing on what other langages do.
Swift traps on overflow, regardless of signedness.
Are you sure? IIRC overflow or underflow is always UB per the (earlier?) C standards; they did not expect a single popular representation system for numbers.
Yes, I'm quite sure. C mandates that unsigned integer types use the typical binary representation, and it mandates wrapping arithmetic. I don't know much about the early C standards, it's possible it wasn't always this way.
> Aborting the program on integer overflow seems so drastic. Say you do that in a server program. Then convincing it to overflow an integer somewhere (possibly somewhere where it is harmless) causes denial of service and all unrelated connections in the process die?
If you don't abort then you probably have an arbitrary code execution vulnerability. Overflow is undefined behaviour in C++, your code is in an unanticipated state, a creative attacker can probably leverage that to do anything.
> If you don't abort then you probably have an arbitrary code execution vulnerability.
There are some patterns in which that is totally true, for example if an integer product is passed to malloc or realloc. Those cases tend to get extra scrutiny in security reviews because it is a well known class of issues. However, it is certainly not the case that every case is exploitable.
Not every case is exploitable, but people are bad at reasoning about which cases are exploitable; a lot of DoS bugs that are initially thought to be harmless turn out to be exploitable. So I think it's important to treat any UB as exploitable until proven otherwise.
Yes, but I would take minor issue in that I wasn't saying what should happen so much as what the options are given various tradeoffs.
There are programming communities that have criticisms of exceptions and would like to avoid them. I am sympathetic to that. But if you do have exceptions, this is a good candidate. If you don't have exceptions, making integer overflow trap seems problematic. Posix signals may be another solution, with its own problems and ugliness.
That's the tradeoff at play here. Think of it like debugging a python program, it might fail hours after you start it but if it does fail it's failed for a reason (uncaught overflows can be very very not good)
And that's one reason that when I write code that divides and it's not known ahead of time, I usually check the denominator for zero.
Doing that for every mathematical operation would not be feasible.
Edit: hello good sir or madam downvoter, can someone explain why this is a controversial opinion? In languages where this error does not throw exceptions and in problem domains where the process really needs to stay up, reasonable caution would often mean you might manually check possible denominators for zero.
I read the sourcecode and it seems the array is 1000 elements (with random elements). If each sum supposedly need 0,1ns then the whole process is around 100ns=0,1 ms. I do not think you can do benchmark on these timescales. Then the algo is repeated on the same data 40 times and averaged. This also doesnt help too much.
So the first run puts everything in L1 cache and then it runs much faster.
Anyway usually anything below 10ms is random, and one should go ar least for 1s (or even more).
Modern superscalar processors have more than one execution unit. The latest Intel microarchitecture has 10 execution ports, which means it can theoretically execute 10 μ-ops per cycle. With a 200+ entry reorder buffer, the processor is trying hard to cram those execution units full every cycle. How successful it is depends on dynamic data dependencies and fetch/issue bandwidth. At 3ghz, to reach 0.1ns latency (on average), it only needs to achieve 3 instructions per clock.
I fully agree these things are difficult and not well defined how to benchmark this. Actually thinking on what you wrote the sum+= should lead to WAW and RAW conflicts, which might not allow full ultilization of the execution ports. It would be interesting to see if having 10 temp(or 16ä
) variables each accumulating one tenth of the dara will run faster. All conflicts should be solvable then
I fully agree with your post. I am just not sure if you are trying to make a point for larger datasets in benchmark (like I did) or you are trying to imply that this is not necessary?
I generally agree the benchmark is too small, but I have serious issues with Lemire's approach to benchmarking in general. Things get pretty nuts at that small of a timescale and things like frequency scaling can make a huge difference. I would draw absolutely zero conclusions on these reported results.
I had that sort of impression as well, but after playing with the benchmark and seeing the assembly code, and checking the actual theoretical throughputs for my microprocessor, I realized that in this case, and previous ones, there was nothing relevant to the results that had to do with the way benchmarking was implemented. These pieces of code would be thought to be in the innermost loop, and practically they might not actually process a very large part of the array. There are solutions, and better approaches, sure, but they pinpoint real issues that cause spread in your performance across microprocessors & compilers that are worth having in mind.
I don't think you realize how much time 10ms is for a computer. Games need to render an entire frame in under 7 ms to hit my monitor's max refresh rate.
I know that 10ms is long and a lot of things can happen. Like adobe thinks it needs an update and the windows scheduler thinks your task needs a pause (thats the worst case) other things can happen as well. Anyway usually i found low time benchmarks not reliable, if you found them reliable thats good.
> adding two large integers can result in an integer that cannot be represented in the integer type. We often refer to such error conditions as overflows.
There's a subtlety here that's missed in the article: In C and C++ this is only called "overflow" for signed integers. That's because that word specifically refers to the case where the behaviour is undefined, and for unsigned integers the result is always well defined: it is the correct result modulo the size of the integer type (which must be a power of 2). For example, UINT_MAX + 1 is guaranteed to equal 0. In particular, the -ftrapv flag mentioned in the article is described in the GCC docs as (emphasis mine):
> This option generates traps for signed overflow on addition, subtraction, multiplication operations.
(Another subtlety, albeit a bit less likely to cause confusion: with signed integers it's possible that the result is negative but doesn't fit into the type, e.g. INT_MIN-1, and this is also called overflow. I mention it because it's tempting to call this "underflow", but that word is reserved for talking about floating point numbers where the result is too small in magnitude to be represented by a non-zero value.)
I don't know Swift at all so I'm quite curious about this bit of the article:
> In a programming languages like Swift, an overflow will result in the program aborting its execution.
Unfortunately, because the article is sloppy about the meaning of "overflow", I don't know how to interpret it. Is it just about signed numbers or not? Also, what other languages are there that are "like Swift"?
Huh. That would make classic LCG PRNGs pretty useless. I guess it's "safer" in that it prevents rare dangerous edge cases at the expense of common everyday safe operations, but that's a tradeoff.
`+` is checked for overflow by default (can be overridden with compiler flags), and `&+` is overflowing addition and does auto-vectorize equivalently to C++.
I'd be quite interested in seeing a benchmark comparing checked and unchecked addition that doesn't get auto-vectorized to understand the performance in practice for non-numerical computing.
I don't believe Rust is anything like Swift. Do you think that by "like Swift" it just meant "languages where an overflow will result in the program aborting its execution"? If so then that was a pretty useless tautology. (I had assumed it meant something like "like Java" meaning any language that uses the JVM.)
> I don't believe Rust is anything like Swift. Do you think that by "like Swift" it just meant "languages where an overflow will result in the program aborting its execution"?
That's literally the sole and entire subject of the article, what else would it mean?
> If so then that was a pretty useless tautology.
The wording is odd but it's a useful classification because it's an extremely uncommon behaviour.
> I had assumed it meant something like "like Java" meaning any language that uses the JVM.
I don't see what your confusion is with my confusion!
Something like "this thing is true about those languages where it's true" would be a totally worthless thing to say: it communicates zero actual information. It doesn't even say there actually is any one language (apart from Swift itself) that satisfies it. Why would the article include an empty statement like that? Even if it's conceivable that it does, surely it's reasonable for me to ask if there's an alternative interpretation that actually had some useful meaning?
> what else would it mean?
It could be that "languages like Swift" refers to a collection of related languages which are pretty obvious if you're familiar with them regardless of this overflow stuff. That is actually the more obvious interpretation to me, because saying that they all share the same overflow behaviour would be meaningful non-trivial information.
I think that's a reasonable thing to think given that "language like X" makes sense for a lot of languages. Maybe Java wasn't the best example, but if someone said "lisp-like" languages or "JavaScript-like languages" you might have an idea what they mean. As I said, I'm not familiar with Swift so for all I know "Swift-like" could make sense too. But it raises the question: what languages would that phrase include? That's why I asked.
After all that mumbo-jumbo I now see your point (not that you're still reading...). I was reading it literally but it was just very badly written. It can be fixed up like this to have the meaning you suggested:
> In some programming languages, like Swift, an overflow will result in the program aborting its execution.
How does this relate to the change in the C++20 standard about two's complement? "signed integers are now defined to be represented using two's complement (signed integer overflow remains undefined behavior)"
That change was simply to reflect de facto reality, removing some undefined behavior in order to make life easier for people writing portable code. I haven’t seen a ones complement machine in decades.
That makes it sound like, in the old spec, you could totally ignore the problem so long as your particular architecture is two's complement. But that's not true: remember that C and C++ are (notoriously) not just high-level assembly, but instead are defined against an abstract virtual machine. What that means in practice is that if your code invokes any undefined behaviour, even if it seems like it ought to work in a particular way given the actual machine it runs on, then the optimiser is allowed to do whatever it likes.
For example, if you convert a number from unsigned to signed and then test `if (x<0)` then the compiler is allowed to optimise away that whole if statement because there's no standards-compliant program that would ever take that path. As another example, if the compiler can prove that a value will be greater than INT_MAX and then you convert it to signed then the computer is allowed to replace that conversion, and the code following it, with a call to `abort()` – after all, no compliant program could possibly follow that path so it can assume that what it has really proved is that that code will never be executed.
This change to the standard means you don't need to worry about those possibilities, even on machines that actually are two's complement.
The main effect is that you can convert from an unsigned number to a signed number and it will do what you would expect from a two's complement representation. Previously this was undefined behaviour (not even just implementation defined!) if the value was greater than would fit in the signed type. This was true even if you were doing an explicit cast, so doing the conversion in a safe way was notoriously fiddly. [1]
Converting signed to unsigned had always been fine, all the way back to C89 I believe. That was specified as being converted modulo 2^N, which matches what happens naturally with two's complement, even on architectures that aren't. (With the change in C++20 the wording for this conversion is simplified, but the effect is unchanged.)
The introduction of the proposal [2] lists some other changes and non-changes. It mentions that demanding wrapping behaviour in arithmetic operations had been a goal of the proposal but there was strong resistance to it so it was dropped. One thing that stands out is that right shift of negative numbers is now defined as sign-extended rather than undefined behaviour, which I think had probably caught a lot of people out in the past. Oddly, it mentions signed to unsigned conversion as a change even though it's effectively the same as before, and signed to unsigned isn't mentioned in the introduction at all even though that's a major change. The rest are mainly to do with representations e.g. if you're memcopying to or from signed numbers.
Yes. For some time I've wondered about doing a programming language for conservative arithmetic, in which you don't ask for "integer" but specify "32-bit signed, exception on overflow" or "16-bit, do not exception immediately but set a flag which I can check at the end of the function" or "8-bit, use saturation arithmetic".
The exact opposite of the scripting language "everything is just a thing and you can add numbers to strings or whatever idk".
Saturation arithmetic is useful in e.g. audio codecs as it corresponds to "clipping"; there's hardware support for it in some SIMD instruction sets and I think it's required for some compression standards. But compilers won't generate it because there's no way to easily represent it in most languages other than hope for pattern-matching in the optimizer.
Would also be a good place for "please compile this function with constant execution time regardless of inputs and fail if you can't do that", useful for crypto.
People have written C++ libraries for integer type generation that allow you to specify many properties including various types of arithmetic and runtime behavior. It is pretty transparent to the programmer and efficient.
However, there is real complexity, in terms of library implementation, in figuring out how this zoo of exotic integer types interact with each other consistently and correctly. Doubly so if you want the safety of these interactions to be verified at compile-time to the extent possible.
I think you also want to use the hardware capabilities of the target platform (overflow detection, automatic trapping) as much as possible and at the same time also allow all other integer arithmetic optimizations. Not sure if all of this can be done from within a user library.
Overflow doesn't mean underflow. Overflow means overflow. You can use "negative overflow" for the special case of overflow where a negative number's magnitude is too big. Though why would you care about the difference between positive and negative overflow, really?
To be fair it's not just underflow, but equality that differs between numeric types. The equality of signed and unsigned integers might not mean what you think it means, but it's exact. The equality of real-valued types always means what you think it means, but it's inexact so you need to write it as "close enough to" instead of "equals".
With integers, "overflow" means "wrap around because the internal representation is really on a modulo field" whereas with floating point "overflow" means "the result is NAN or INF" and undeflow means "the result is NAN or zero or DENORM" but it's awkward to use such phrases in everyday conversation. Especially to your boss, whose eyes glaze over when you start using such words and he'll just interrupt with a "when will the JIRA be done? I have another meeting in five."
The rule of thumb is that you have to know what you're doing to use lower-level languages like C++. That already eliminates 80% of the programmers out there but thank goodness there are high-level languages that reduce to simplistic concepts and let software be churned out at a rapid pace.
Note that Rust also traps on integer overflow, and it is trivial to „hack“ the compiler to disable this, and many have done so.
While on this microbenchmark you see a 3x slowdown with the LLVM backend, on large Rust projects like servo, Firefox, the rust compiler, etc. the slowdown is not even measurable.
Also, Rust provides you with unsafe intrinsics to opt-out of trapping in the particular line of code in which it causes an issue.
Rust wraps in release mode and traps in debug mode[0][1]. All large projects that I know of ship in release mode. Though I'd be interested to see a source for it not pessimizing badly.
See my reply to masklin in this thread for why this is incorrect. In release mode, the behavior of integer overflow in Rust is not UB/cannot happen, but modulo two arithmetic.
>> Rust wraps in release mode and traps in debug mode
You wrote:
> In release mode, the behavior of integer overflow in Rust is not UB/cannot happen, but modulo two arithmetic.
The parent didn't claim or imply that overflow is UB. The parent wrote that it "wraps", which is equivalent to your "modulo two arithmetic". You are not in disagreement, so why are you disagreeing?
It is quite believable that trapping does not slow down Firefox. GCC is part of SPECint, and trapping is known not to slow down SPECint GCC. (It does slow down other SPECint benchmarks.)
That’s unfortunately incorrect. That flag does not turn integer overflow into UB, which is what allows the C++ optimizations that assume it cannot happen, and can be used to prove that loops terminate, etc.
That flag changes the behavior of integer overflow from trapping to “modulo 2 arithmetic”.
If you want to change the behavior of integer overflow in Rust to “always assume it cannot happen”, you need to “hack” the compiler AFAIK.
If you believe this is incorrect, provide the flag that proves that (hint: such a flag makes safe Rust unsound and it therefore does not exist).
I find this decision frustrating. They made it clear by making it trap in debug that this is a "bug" but denied the optimization benefits of calling it undefined behaviour on release.
I'm assuming the rationalization is "wrapping is better than undefined behaviour" but I have not seen any evidence that the average program is actually more likely to behave correctly in the face of wrapping.
The combinations that make the most sense to me are:
1. debug: trap, release: trap. Bad performance, correct.
2. debug: wrap, release: wrap. Good performance
3. debug: trap, release: undefined. Great performance.
I guess my point is make up your mind. Is "overflow" allowed and wrapping or a bug?
> but I have not seen any evidence that the average program is actually more likely to behave correctly in the face of wrapping.
Rust takes a simple stance of "safe code should be UB-free".
Yes, you may have logic bugs with wrapping, but you can't cause memory unsafety with it in, because things like array/slice bound checks aren't ever disabled.
Also, C relies on undefined signed overflow for things like "`for` loops incorrectly using `int` instead of `size_t` because it's easier to type", which doesn't really apply to Rust (which require `usize`, the `size_t` equivalent, for indexing, and has pointer-range-based iterators for slices), so I doubt UB overflow in Rust would help performance much.
It would be trivial to change `rustc_codegen_llvm` to set LLVM's `nsw`/`nuw` (in order to make signed/unsigned overflow UB), if you want to prove it improves performance somewhere.
The tl;dr is that (on x86-64) the additional branch to trap has almost negligible cost (processor already sets the flag on overflow, jump is always predicted), but it completely breaks vectorization and loop optimizations, which significantly impacts certain benchmarks.
I once worked on a project that compiled Perl-compatible regular expressions to C code using Ragel + custom modifications that could switch to NFA machines for subexpressions as required. The generated NFA machines required simple stack operations (push and pop of NFA state) on a preallocated array. Originally the PoC attempted no bounds checking, and so could theoretically overflow the array, possibly (likely) corrupting the heap. When I added bounds checking to each generated push operation (a simple test + goto), throughput plummeted by something very significant--two- or three-digit, not single-digit percentages. So I allocated the array with mmap and a guard page. And rather than pick a magic number and call it a day, on SIGSEGV (or SIGBUS?) I would longjmp back to a point where I could grow the array and restart match execution from the beginning. Performance returned to where it was originally.
Fortunately, nothing in the code that might trigger SIGSEV could leave behind inconsistent state or orphaned resources that might need to be collected. The longjmp from the signal handler was perfectly safe and obeyed all relevant constraints; it didn't implicate undefined behavior.
Unfortunately, neither POSIX nor Linux (nor any Unix, AFAIK) supports per-thread signal handlers, only a global handler per signal. And you can't safely access thread-local storage from a signal handler, anyhow.[1] To distinguish an expected SIGSEGV from an unexpected SIGSEGV, and to support multiple threads, I used sigaltstack. The SIGSEGV handler would query the alt stack. If there was no alt stack, it aborted. If there was an alt stack, it would derive a per-thread data structure stored in a page beyond the top of the stack with an intervening guard page of its own. (sigaltstack is per-thread.) The guard page stored the jmp_buf structure needed for the longjmp, and the array boundaries needed to distinguish NFA state overflow from a real bug. The siginfo_t reference passed to the signal handler provided the fault address.
This was an interesting little subproject because it showed just how performant hardware-based exceptions could be (I never did much kernel or assembly programming), as well as how detrimental conditional branching could be to performance. And it was almost certainly the only time I'll have a truly legitimate reason to catch SIGSEGV, as opposed to previous cases I've seen in some other projects that had the horrible idea of attempting to recover from a truly invalid and uncontrolled memory access.
[1] POSIX doesn't permit it, though in practice it depends on the libc. musl supports it because it preallocates all TLS when creating a thread, but it's unsafe on glibc as glibc lazily allocates TLS. We were using glibc, and while theoretically I could have induced TLS slot initialization, thread-safe reentrancy of internal machinery was an area where glibc actually had (and has, last time I checked) quite a few open bugs, so I wasn't about to play games there.
The problem with integer overflow, in the general context of UB, isn't that it's expensive to trap.
The problem is that the C (and C++) specs insist on overflow being undefined behavior. Then compilers insist on abusing the UB as meaning that they get to do whatever they want.
This particular case is only partially the compiler's fault, as the various specs should not be defining well specified behavior as being undefined.
Fully in the compiler's fault is the treatment of UB meaning "now I can do whatever I want", which turns the compiler into a security adversary. I have yet to see a non-microbenchmark, non-spec benchmark, that demonstrates a real world win by subverting the developer's intent.
C compilers should assume that every bit of code is necessary. The only dead code is that which is predicated on a compile-time constant, as in:
if (0) {
/* safe to ptimize this away */
}
Code which tests a run-time condition must always be assumed to be doing that for a reason.
Just provide excellent code generation: great peephole optimizations, jump threading, instruction selection: all the "classics". Try to put local variables into registers nicely. Treat every pointer dereference as a load or store.
No fucking bullshit like, "oh, this pointer was dereferenced four lines up from here so this null pointer check can be removed".
Because, like, maybe that dereference is a simple mistake, and not an iron-clad logical proposition asserted by an infallible programmer.
C was originally designed as a language in which optimizations were up to the programmer. If you wrote clever code like while (* d++ = * s++); to copy a string, that was not just for brevity; it produced better code. That was the idea. The compiler was dumb, and so the fact that in that entire expression, s and d appear only one time meant that they were only accessed one time in the generated code. The programmer was the optimizer. Terse code stuffing multiple effects into a single expression ran faster. And by that same token, no pun intended, code that was less terse was safer. It made all the memory accesses and evaluations that it looked like it was doing, in the order it was doing them.
You're essentially describing a C compiler with advanced optimisations disabled, no?
If you want improved safety and fewer footguns, the solution is to use a safer language than C, rather than to try to declaw a fundamentally unsafe language. Even MISRA C fails to guarantee the absence of undefined behaviour. (Other far more involved projects have made this a goal. [0][1]) The obvious candidates are Ada, Rust, and Zig.
It doesn't make sense to mandate the absence of serious compiler optimisations, especially for a language like C.
It absolutely makes sense to mandate the absence of serious compiler optimizations in an unsafe language, in which programs frequently stray outside of the specification, and in which the programmer has excellent tools for optimizing by hand.
You can write C that is nearly impossible not to generate into a nearly optimal instruction sequence with just basic optimizations that do not try to get clever by with deductions arising from multiple disconnected facts coming from separate statements in the program.
There is a lot of optimization latitude without doing crazy things. You can turn a switch statement into a jump table without causing a problem, for instance. Unrolling loops is fairly safe. You can allocate registers well; you can improve the instruction sequences. You can thread jumps. You can fill branch delay slots.
I don't need the compiler to delete a line of code for me. If I actually delete it myself in the source code, that will shave off just as many cycles!
Now if the compiler would simply warn "that null pointer comparison is being done on a pointer that was dereferenced eralier", that would be useful. Instead of optimizing behind my back, give me diagnostics using which I can change the program to make it faster. I could look at that code and see: by golly, that is right; the pointer is guaranteed not to be null in that code, so we can drop the check. Or, oops, no; that dereference should not be happening before the check!
You can't have compilers guessing about the purpose and intent of disconnected statements in relation to each other, under the assumption that there is no mistake anywhere. Even philosphically, that is stupid.
Advanced optimizations arising from piecing together a deduction from multiple assumptions and facts should all take the form of diagnostic suggestions about how to change the source code.
> It absolutely makes sense to mandate the absence of serious compiler optimizations in an unsafe language
C is intended to run terribly fast, so optimisations are vital. It doesn't intend to hold your hand, so the possibility of surprising consequences from undefined behaviour (subtle programmer mistakes) are permissible under its philosophy.
There'd be little point making C slower without making it safe, that's why no-one has done it.
> You can write C that is nearly impossible not to generate into a nearly optimal instruction sequence with just basic optimizations that do not try to get clever by with deductions arising from multiple disconnected facts coming from separate statements in the program.
That doesn't sound right. Plenty of platform-specific optimisations do not work well at the level of C source code, and for instance cannot be performed by source-to-source optimisers. Generation of prefetch instructions, branch-prediction hinting, auto-vectorisation, computing both division and modulo using a single instruction. C isn't an assembly language.
> If I actually delete it myself in the source code, that will shave off just as many cycles!
It's a good thing that compilers are capable of deep inlining and removing code that is dead for a given context.
> if the compiler would simply warn "that null pointer comparison is being done on a pointer that was dereferenced eralier", that would be useful
We already have linters/static analyzers/compiler warnings for C.
> Even philosphically, that is stupid.
C doesn't pretend to be safe. Annoyingly, it can't really even be effectively checked with runtime checks on debug builds. (It's quite hard work to check against out-of-bounds array access. I believe Valgrind manages it for many cases.) I agree it's a valid criticism of the language given that we don't seem to get all that much in return. Safer languages like Ada can be used for everything C is used for, for roughly zero performance penalty.
> C is intended to run terribly fast, so optimisations are vital.
Firstly, says who? The first implementations of C, or its predecessor, by Dennis Ritchie, compiled into threaded code: sequence of function calls. He simply wanted to work at a higher level than assembler.
Secondly, which optimizations?
Optimizations like "this loop must be infinite because i is incremented from 0 and going past INT_MAX would be undefined behavior; and therefore the code after is unreachable and can be deleted" are not in "vital" in any shape or form.
That kind of thing points to the work of an ego that took the shortest possible career path into becoming a compiler committer, long before acquiring the required maturity.
> It doesn't intend to hold your hand
Modern C compilers are crossing from the "not holding your hand" territory into "knocking a tool out of your hand" territory.
> It's a good thing that compilers are capable of deep inlining and removing code that is dead for a given context.
Removing code because of "if the behavior of prior code was written by a perfect programmer who follows ISO C to the letter, this line cannot be reached" is definitely not a good thing.
Removing code because of some constant expression, or expression that is always true doe to type (like x < 0, x being unsigned) is certainly a good thing.
Well, the modern software world. Performance is one of C's great advantages. Poorly optimising C compilers do exist, such as obscure compilers targeting obscure architectures, but much C code is written for speed. Kernels and game-engines are written in C and C++ partly for performance reasons.
(Although, again, it's not as if we have to keep using C. Much safer alternatives do exist that don't have significant performance penalties, and it's a pity they don't get more traction.)
> Optimizations like "this loop must be infinite because i is incremented from 0 and going past INT_MAX would be undefined behavior; and therefore the code after is unreachable and can be deleted" are not in "vital" in any shape or form.
I believe the assumption of no UB can be important in auto-vectorisation, a very significant optimisation.
There's a quantifiable performance advantage to any optimisation. The compiler vendors found it worth pursuing, so there must be something to it. These optimisations can be disabled, but this will harm performance.
As to whether it's 'worth it', that's a matter of opinion. I still think that if you want a safe language, you should just use a safe language. C is not a safe language, and isn't trying to be. Even without these specific optimisations, there are still plenty of other ways buggy C code can still cause havoc. The only way to properly tame C is with projects like ZZ and Armada, as I previously linked.
> That kind of thing points to the work of an ego that took the shortest possible career path into becoming a compiler committer, long before acquiring the required maturity.
It's not reasonable to diagnose an engineering decision as a personality disorder. Also, I believe most of these decisions were made by committee, but I'm not well read on C's early days.
They decided to make C the kind of language that places a great burden on the programmer, regarding subtle errors. We agree that this is a real downside of the language, but it's not without its advantages. C is the gold standard for performance, and has been for decades. The presence of UB has probably been somewhat helpful to that end, and has also helped C's portability.
(Again though, modern compilers can optimise Ada very effectively. With modern compilers, I don't think C's UB is all that advantageous for performance.)
> Modern C compilers are crossing from the "not holding your hand" territory into "knocking a tool out of your hand" territory.
That's a rather good metaphor for aggressive optimisers. I'm reminded of a comment on programming languages: C treats you like an adult. Pascal treats you like a child. Ada treats you like a criminal.
> Removing code because of "if the behavior of prior code was written by a perfect programmer who follows ISO C to the letter, this line cannot be reached" is definitely not a good thing.
> Removing code because of some constant expression, or expression that is always true doe to type (like x < 0, x being unsigned) is certainly a good thing.
This strikes me as a distinction without a difference. It's the same optimisation: elimination of unreachable code. The compiler is always permitted to assume absence of undefined behaviour, that's how undefined behaviour is, well, defined. If there's any UB present in an execution at all, all bets are off. I think you're suggesting a different approach would be better, but I'm not sure what you're proposing.
We could try to define a language similar to C but devoid of undefined behaviour. Part of the spec could be adjusted to use indeterminate values instead, such as for signed integer overflow. We could define divide-by-zero as instantly terminating the process. (The C++ language takes this approach very occasionally, [0] but defining things this way is troublesome for embedded systems compilers.) This language wouldn't be as easy to compile, though. Features like C's unsafe pointer arithmetic, and unsafe function pointers, seem hard to sanitize. I wish I knew Ada better for comparison.
> It's the same optimisation: elimination of unreachable code.
No it isn't, because in the one case, if you don't eliminate the code, it turns out to be reachable. The reason is that the logical predicate by which it was deduced to be unreachable ("the program is perfectly written and doesn't invoke undefined behavior") is not actually true.
So that is to say, this code will actually terminate due to wraparound past INT_MAX "for (i = 0; i >= 0; i++);" and statements which follow it are reached. That same compiler has wraparound behavior for int.
It's not the same as "if (1) return;" where the basis for removing the subsequent code is simply true.
So you're suggesting restricting the optimisation to trivial cases only, greatly hampering what it can do.
Again, your suggestion is imprecise. Would you want undefined behaviour eliminated from the language entirely? If yes, deep changes would need to be made. If no, there's little point, as it's still a highly unsafe language.
Small, but vocal! I think I share kazinator's preferences here. For a C++ compiler, maybe you need dead code elimination. But for C, I'd prefer a warning accompanied by reliable execution.
I think people underestimate the extent to which things like "dead code" elimination matter. Basic, table stakes optimizations like constant propagation and inlining result in a ton of scenarios like the if (0) { } thing the GP mentions, even if such constructs don't explicitly appear in the code.
People know what they don't like and so they logically suggest that this optimization should not occur, but it is hard to understand the extent by which this type of optimization is helpful in other places and perhaps how it is intertwined with other useful optimizations.
Dead code elimination based on iron-clad truths is important; truths like constant expressions, or expressions that evaluate true due to the range of a type or whatever.
Elimination of some code block B based on some cockamamie hypothesis about some statement A being well-defined is not important.
Maybe that single optimization is not particularly important. In any case, you can turn it off with - fno-delete-null-pointer-checks, although I'd recommend against accessing null pointers in any case (you are often at the mercy of the optimizer anyway: hoping it generates non-trapping code).
I've never bought this argument. If wrapping signed arithmetic (-fwrapv) were the default, and an overflow occurred, then most programs would still not function as intended.
Basically your argument is that programmers should be implementing:
Both of these solutions are garbage, and blaming the compiler because your language (C) is too weak to help you out is shrugging off responsibility.
In places where overflow is likely to matter from a security perspective (parameters derived from user input for example), what mode is used is irrelevant. It's actually easier for a programmer to check for overflow using a safe compiler intrinsic[0] (or in C++ use a checked[1] or multiprecision[2] integer library) than it to correctly implement their own checks.
Implementation defined means that the behavior is defined. This is patently not what the existing behavior of compilers in the face of integer overflow is.
Signed integer overflow is decidedly not implementation-defined behavior in GCC, clang, recent MSVC, ...
I think you misunderstand the meaning of "implementation-defined behavior". It means the implementation chooses a particular behavior and documents it does so. For example, a compiler might choose to always wrap on signed overflow and put that in their manual. That's not what's happening.
No, both define it by default on every platform, with no flags needed. Flags allow you to force behavior one way or another.
Here [1], for example, is the output from godbolt demonstrating exactly this. Note there are no flags specified, it produces valid platform specific code, and it does so for every platform listed. You can then set flags if you desire, but there is a default, as I stated.
Yes, which is exactly what I stated. My response was to “ Implementation defined means that the behavior is defined. This is patently not what the existing behavior of compilers in the face of integer overflow is.”, implying compilers didn’t have implementation defined behavior. For integer overflow, they do. You just noted this compiler chose an implementation defined behavior. And you don’t need flags for it to choose.
You just stated “ Neither compiler defines it unless you give it flags to specify a certain behavior”. It does define it, exactly as I said, per compiler per platform. No flags made any of them produce implementation defined output. Flags make them produce specific, cross platform behavior.
The compiler has not committed to a behavior, it is free to compile that code to trap the next time you try if it so wishes. "Implementation defined" means that on this particular implementation the compiler will always do the same thing.
> Are you claiming if I compile the same code with the same version of the compiler on the same platform the behavior changes from compile to compile?
Your claim ("the same code on the same platform compiled with the exact same compiler generates the same machine code") is a very weak property. Notably it doesn't mean that integer overflow behaves consistently, because it doesn't; the compiler assumes signed integers will never overflow and optimizes on that, so in one spot you might get wrapping, but in another spot the compiler removes your bounds checking, all in the same piece of code, on the same platform, compiled with the exact same compiler.
You probably know this, but by making it UB, compilers don't need to have the same behavior on every case of integer overflow. This allows compilers to apply much more optimizations.
what you're after is specified behavior. C/C++ essentially has specified, unspecified, and undefined behavior. I can't recall correct terminology.
Basically (in order), universally defined behavior, architecturally defined (under/overflow semantics, argument evaluation order, etc), things that cannot be defined (UaF, buffer under/overflows, etc).
The problem is that there is a depressing amount of undefined behavior is unnecessarily undefined. UB should be specifically limited to things that cannot have a predictable/defined behavior on any arbitrary platform - e.g. memory errors, etc.
Measuring this is tricky for a macrobenchmark because most codebases tend to split into “give me -fwrapv or give me death” or “I care about avoiding undefined behavior so I insert checks myself”. You can’t change any compiler flags to change during benchmarking this for either because for the first one you may break the program and for the second those checks will prevent wrapping for occurring regardless.
For what it’s worth, the usually argument given for overflow being undefined is that it makes loop analysis much easier and as a result it allows for better hoisting and strength reduction and autovectorization.
The big thing that isn't talked about is the fact that the problem with forcing a vendor-specific definition on the undefined behaviour of signed integer overflow is that significant optimization opportunities are lost.
The danger of signed integer overflow in C++ is not that your less-than-zero comparison is not working in a reverse for-loop (although, to be fair, that's a danger encountered by newbies). The big problem is that the compiler will optimize away huge chunks of code when it detects a possible signed integer overflow resulting in weird and unexpected behaviour at runtime. It's not really an exploitable security issue so much as a massive cost issue from stretched out development time or increased support costs (depending on how well your code gets tested before release).
>The compiler will attempt to use hardware instructions to implement these built-in functions where possible, like conditional jump on overflow after addition, conditional jump on carry etc.
I suspect something's not exactly right with compiler flags.
Then again, how much of your program is integer arithmetic?
12x slowdown on 0.1% is 1.2%.
What I find annoying is we go back and forth over this because the insistence on clinging to the one true holy int type as on Gods Blessed Machine the PDP 11. Instead of implementing proper variable types like you have with Ada.
Seriously considering code can compiled for on 16, 32, and 64 bit machines 'int' is dodgy as hell.
Overflow for signed types is undefined behavior. Hence the compiler can trap when it happens and remain standards compliant.
For unsigned types, overflow is defined behavior, and hence -ftrap will not trigger on overflow.
An interesting flip side of this is that using signed types can be a lot faster because the compiler is allowed to assume that overflow will not happen, which lets it apply more optimizations.
Since most compiler vendors provide an implementation-specific way to define the undefined behaviour of signed integer overflow as a vendor extension, you will need to resort to compiler-specific means to control that.
For example, in GCC you could use a `#pragma push_options("-fno-trapv")/#pragma GCC pop_options` pair to surround the offending code.
Writing (compiler) microbenchmarks is notoriously hard. Overall no benchmark should run fewer than two mins or so to account for context switching noises at least. It should run on fixed CPU frequency (no turbo boost, etc), fixed CPU core(s) too. It should be aware of L1/L2 cache sizes - data crossing L1/L2 boundaries on different processors results in disproportional results.
The difference in compilers is too high and the author should just disassembly of the generated code. Imo, a bad article.
At what performance differential are those factors relevant? For a 5% difference I would share your sceptisism. However, the effect size is a 4x difference (3x slowdown vs 12x slowdown). To me, it seems very unlikely to be a result of inconsistent benchmarking. However, I am not an expert on the subject.
The test is too short to measure anything of use. Also you want the same code run multiple times and the results collected, at the least you'd like standard std dev between tests.
I think unless you own the entire machine almost all microarchitecture level benchmarking is playing with fire because of all the things you've mentioned. Even if you use perf_event to get as low as the OS allows you to account for things that'll slow you down there's still overhead associated with that AFAIK
Also, code disassembly is a fairly awful way of assessing performance unless you mean for the purposes of the article
Article in this case, hence the new paragraph. I used to routinely do for Java... with -XX:+PrintAssembly to ensure certain bound checks were eliminated by enforcing the index fell within the lattice [0-array.length). A simple test that always succeeded improved zip compression by 15% for instance.
Not going in details but verifying results of microbenchmarks with assembly is quite useful - of course usually limited to very hot spots. Of course about owning the hardware and doing on bare metal with fixed freq (core/uncore) + set voltage(s). Removing as many variables as possible is important just as knowing what to test and how it gets affected by the hardware.
GCC seems to be generating an out-of-line function call to implement the trapped addition, which is going to inhibit a lot of other optimizations like vectorization:
https://gcc.godbolt.org/z/zj6qbn
By contrast, Clang is not:
https://gcc.godbolt.org/z/r7avno
Btw if you only need overflow checking in a few places, you can use overflow detecting intrinsics:
https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins...
In C++, this is going to be way more useful than -ftrapv anyway, since in a lot of situations you're dealing with user input and you're probably going to want to throw, not abort.
https://gcc.godbolt.org/z/5qvTf9