In some significant ways, it's not strong at all. It's stronger than Javascript but it's difficult not to be. Python is a duck typing language for the most part.
Duck typing is an aspect of it being dynamically typed, not whether it is strong/weak. But strong/weak is not formally defined, so if duck typing disqualifies it for you, so be it.
Ffunction-sections and fdata-sections would need at a minimum to be used to strip dead code. But even with LTO it’s highly unlikely this could be trimmed unless all format strings are parsed at compile time because the compiler wouldn’t know that the code wouldn’t be asked to format a floating point number at some point. There could be other subtle things that hide it from the compiler as dead code.
The surest bet would be a compile time feature flag to disable floating point formatting support which it does have.
Still, that’s 8kib of string formatting library code without floating point and a bunch of other optimizations which is really heavy in a microcontroller context
I think this is one scenario where C++ type-templated string formatters could shine.
Especially if you extended them to indicate assumptions about the values at compile time. E.g., possible ranges for integers, whether or not a floating point value can have certain special values, etc.
> it’s highly unlikely this could be trimmed unless all format strings are parsed at compile time
They probably should be passed at compile time, like how zig does it. It seems so weird to me that in C & C++ something as simple as format strings are handled dynamically.
Clang even parses format strings anyway, to look for mismatched arguments. It just - I suppose - doesn’t do anything with that.
That’s passed at compile time via template arguments and/or constexpr/consteval. Even still, there can be all sorts of reasons a compiler isn’t able to omit something as deeply integrated as floating point formatting from a generic floating point library. Rust handles this more elegantly with cargo features so that you could explicitly guarantee you’ve disabled floating point altogether in a generically reusable and intentional way (and whatever other features might take up space).
It’s also important to note that the floating point code only contributed ~44kib out of 75kib but they stopped once the library got down to ~23kib and then removed the c++ runtime completely to shave off another ~10kib.
However, it’s also equally important to remember that these shavings are interesting and completely useless:
1. In a typical codebase this would contribute 0% of overall size and not be important at all
2. A codebase where this would be important and you care about it (ie embedded) is not served well by this library eating up at least 10kib even after significant optimization as that 10kib that is intractible is still too large for this space when you’re working with a max ~128-256kib binary size (or even less sometimes).
Not me but a friend. Things like making electronics for singing birthday cards and toys that make noise.
But there are plenty of other similar things - like making the code that determines the flashing pattern of a bicycle light or flashlight. Or the code that does the countdown timer on a microwave. Or the code that makes the 'ding' sound on a non-smart doorbell. Or the code that makes a hotel safe open when the right combination is entered. Or the code that measures the battery voltage on a USB battery bank and puts 1-4 indicator LED's on so you know how full it is.
You don't tend to hear about it because the design of most of this stuff doesn't happen in the USA anymore - the software devs are now in China for all except high-end stuff.
Hotel safe might, if it logs somewhere (serial port?).
The others may have a serial port setup during development, too. If you have a truly small formatter, you can just disable it for final builds (or leave it on, asssuming output is non blocking, if someone finds the serial pins, great for them), rather than having larger rom for development and smaller for production.
mostly used for debugging with "printf debugging" - either on the developers desk, or in the field ("we've got a dead one. Can you hook up this pin to a USB-serial converter and tell me what it's saying?")
Below a certain threshold, "quick calls" are the best thing to do.
Some of the most inspiring discussions come from someone bringing up an issue they have right there and then. Good discussions often start from something that doesn't seem so important, that doesn't have a clear outline from the start.
If there isn't the possibility to start a discussion instantly once in a little while, there is a good chance it won't ever happen.
>Good discussions often start from something that doesn't seem so important, that doesn't have a clear outline from the start.
Yes, but I've never once had a "can we hop on a quick call?" turn into that. Almost every single one of those that I've encountered could've been settled in a handful of chat messages. The things you describe happen to me during unplanned asides in meetings, long-term general chat channels, or impromptu in-person chatting.
My impression has been that the people who want a quick 1-on-1 call just don't like typing.
> My impression has been that the people who want a quick 1-on-1 call just don't like typing.
Yes, similar to my ex (can't get rid of her completely, shared custody of the kids) who always insist on sending voice messages. She just don't want to type and keep sending voice message after I told her several times I hated that because it made looking for past information super difficult.
By the way, wasn't there a slack like app that was working kind of a walkie-talkie kind of system within a team?
You're right that it happens much less with calls vs in the office where there is the important signal of physical presence. Personally I only get very few "can we have a quick call" at all. But I feel like it's a culture thing, people shouldn't worry about making a quick call proactively, once in a while. And maybe acceptance will change that remote workers should e.g. have a remote camera always-on. I'm not entirely decided but tend to think it would be a good thing. Maybe it would be more accepted if a person that is being looked at would get an instant signal about it.
> Below a certain threshold, "quick calls" are the best thing to do.
Sure, but give the person you asking for a quick call the context to decide if they also think it is a quick call. All the article basically is asking for is to provide enough context when contacting people.
If you do text, you only help once. The problem is solved for everyone else, and you leave a trace for people searching for similar problems.
On the other hand, in secret 1:1 calls with no trace, that only help one and only one person, you can appear busy and productive while also helping people in secret, and doing so multiple times for the same or similar problem. You reap the rewards multiple times.
And since people never put any initial effort at all (they just say "hi do you have a minute?" instead of describing their issue), if multiple people have the same problem right now, then you have plausible deniability for your multiple calls because you didn't know that they had the same issue, so that's why all calls were separate instead of solving them all at once.
Don't even mention the fact that others can explain their initial problem with screenshots, or with a Slack recording (literally 2 clicks, and then doing exactly the same as they would do during the first minute of the call anyway), to get quick context and then either solve the problem instantly or jump to a call. Nono, while those things get points for not being searchable, they still leave some useful information in the history (mainly some context and the moment in time when you had that conversation), and we can't risk having that.
Also, if they send the video, Slack might even include a searchable transcription (I don't know if that's a feature yet though), and if that wasn't bad enough, if you end up not knowing the answer, the video can even be shared with others on a shared channel to save time finding the person who knows how to fix the problem.
Hopefully you understand now the problems with async text communication.
You're being cynical and it seems you take a quite extremist stance. My impression is you're blind to the value of ephemeral, low-ceremony discussion. Just bounce an idea of someone else and see what comes back. Sharing fews, helping each other learn, etc.
Not everything needs to be recorded, in fact I'm very happy that most discussions aren't.
Spare me the crap. I've seen many company wikis full of stuff that nobody ever looks up, because it's mostly irrelevant, outdated, or plain wrong content.
The value of discussions mostly isn't in the things that can be recorded or searched later, but in the effort the participants put into it.
As stated before, this is up to a certain threshold. One or a few low-friction discussions per day can be very fine. It shouldn't take up the biggest part of your day's focus time.
He does have a point though. Easily "recordable" communication is bad for office politics.
Remember that old advice to email the boss with a summary of what he told you to do verbally - asking for confirmation that you understood it right as a cover - to cover your ass?
I tend to favor calls with people who prefer calls, and text with people who prefer text. But if it starts getting abused, the situation becomes different.
I don't mind agreeing to a bare zero-effort "hi, can we go on a call?" from time to time. And I give a lot of leeway to juniors and new hires, bending over backwards more than I probably should. But if it becomes a habit you can bet I'll start delaying my responses until the other party starts putting some upfront effort.
I usually expect mutual respect. Helping each other as fellow professionals working towards the same (company) goal, is one thing. Asking for hand-holding, or expecting zero-effort to be repaid with non-zero effort, is a completely different thing.
If someone asks another to go out of their way to ignore their other responsibilities and give you their undivided attention right now in a way that's uncomfortable to them, then it's only fair for this to be a two-way street. At some point, maybe not now, maybe not soon, but at some point, some sort of reciprocation is expected if this trend continues.
My previous comment was mostly just me ranting of places where that reciprocation wasn't the case because people only ever expect things to be done in the way that's comfortable for them specifically; which in my bubble, it has been mostly with the "only-calls" people, unable to hold any semblance of conversation over text. (EDIT: The actual topic might not be the same, but the mood definitely came from there.)
A pair programming session where we're both doing something, bouncing ideas, you check stuff on your end while I also check stuff here on my end, okay, that's a good thing.
But if it's just me remote-controlling that other person with my voice, I can't call that productive, and I'd rather play that pin-the-tail-on-the-donkey game while also working on my current task.
In my bubble, when people preferred only-calls it was usually also the case that they put almost zero effort when asking for help and just didn't want to bother spending literally 2 additional seconds to take a screenshot.
I can rant for even longer, for example how those same people (again, in my experience so far) prefer to go to the office to use a whiteboard in the name of efficiency, and somehow aren't bothered by the fact that you can't Ctrl+Z, that you can't move stuff around, that you can't rotate stuff, that you block the view while modifying, etc. And if you say that a few 90 EUR digital tablets would actually be more efficient, they say they can't afford it, but at the same time the managers travel (flights+hotels+transportation) to several countries in person to introduce themselves to their teams in person because this is actually more efficient.
And the people who prefer calls, and know that will be requesting calls frequently, and sharing screen frequently, and talking about stuff in their screen frequently, don't even purchase a cheap drawing tablet to make it easier for them to explain stuff graphically.
So yeah, take that as additional context for my rant.
TL;DR: If you prefer calls, I will tend to use calls with you until you abuse this, and this abuse usually happens eventually if I stay in a company long enough (fortunately not always).
Oh funny one. I have atm a part time contract where I go to the office twice per week.
I decided with a coworker today that we'll go to the office an extra day tomorrow because we'd rather do that than sit for a couple hours with headphones in our ears.
On our own, no management involved.
When it makes sense to talk, I can talk. Most of the time, it doesn't.
Not engaging enough. I write a couple lines, expect a couple lines some 1 to 180 minutes later that might miss half the point, etc. Totally different dynamic.
Having to cache in the previous state of the discussion whenever I receive a reply is exhausting. So some things are just not brought up.
And whenever it's actually an interactive synchronous live-chat, why not just hop on a call then?
Are you implying that 1) everything from the chat is immediately present in the brain when taking up the discussion at a later point, as well as 2) Last time you left the chat, all the the relevant context was well encoded as chat text in the first place?
It's certainly better to have chat text than to not have it, but whether it can make up for the huge cost of asynchronicity is another question (each delay breaks your flow, requiring you to cache-out, cache-in...)
In my experience from writing a toy compiler, the speedup you get with a reasonable set of optimizatios compared to spilling each temporary result to memory is in the ballpark. There are vastly different situations of course, and very inefficient ways to write C vode that would require some compiler smartness, but 2x is a number that you'd have to contend with actual measured data to make the claims you made.
In many cases I'd suspect the caches are doing exactly what you alluded to, masking inefficiencies of unnecessary writes to memory, at least to an extent. You might be able to demonstrate a speedup of 100x but I suspect it would take some work or possibly involve an artificial usecase.
That's just the usual resource ownership management problem that Rust is supposed to solve.
But a simple templated type like GP proposed does indeed fix the issue discussed here. To access the thing in the first place you need to lock the correct mutex. Looking at Folly::Synchronized, locking doesn't even return the protected item itself directly. In most cases -- unless the bare pointer is needed -- you will access the item through the returned "locked view" object which does the unlocking in its destructor.
Sure, Rust "just" enforces type safety. But without type safety a type can't help us much more than the textual advice did so I think that's a really big difference, especially at scale.
In a small problem the textual advice is enough, I've written gnarly C with locks that "belong" to an object and so you need to make sure you take the right lock before calling certain functions which touch the object. The textual advice (locks are associated with objects) was good enough for that code to be correct -- which is good because C has no idea how to reflect this nicely in the language itself nor does it have adequate type safety enforcement.
But in a large problem enforcement makes all the difference. I had maybe two kinds of lock, a total of a dozen functions which need locking, it was all in my head as the sole programmer on that part of the system. But if we'd scaled up to a handful of people working on that code, ten kinds of lock, a hundred functions needing locking I'd be astonished if it didn't begin to have hard to debug issues or run into scaling challenges as everybody tries to "keep it simple" when that's no longer enough.
GP isn't totally wrong. With folly::Synchronized, you can lock, take a reference, then unlock, and continue to (incorrectly/unsafely) use the reference. The compiler doesn't catch that.
folly::Synchronized<int> lockedObj;
auto lockHandle = lockedObj.wlock();
auto& myReference = *lockHandle;
lockHandle.unlock();
myReference = 5; // bad
Still, it is harder to misuse than bare locks unattached to data.
Yes, but you can also take a reference (copy a pointer), delete, and continue to use the reference etc. I was pointing out that this is simply a lifetime/ownership issue, not an issue specific to locking (and yes, Rust solves that issue, at least for the easy cases). And as far as the problem is protecting access to a locked resource, a class like folly::Synchronized does indeed solve the problem.
That sounds good in principle but is it practical? As long as you have something to do while holding the lock, chances are that implementing that something requires calling a function. That or code duplication.
In my experience it is practical. Let's say you have a shared linked list, you take the lock in your insert function, you insert an item, you give the lock back. No function calls.
Code that looks like what you describe, "implementing something that requires calling a function", tends to deadlock or be wrong. A really smart guy I worked with wrote some database driver that looked like that, it worked, except when it deadlocked, and finding that deadlock was a nightmare. I'm sure there are exceptions but this rule will get you out of a lot of trouble. If you need to violate the rule try find a different synchronization/concurrency mechanism or a different data structure.
Even if the code is initially correct, inevitably someone will refactor it without realizing a lock is taken and break it.
> you take the lock in your insert function, you insert an item, you give the lock back
yeah but what if "you insert an item" is literally hundreds of lines, and there are 3 layers of api functions below you? What if you need to to take other locks for example to apply backpressure / flush out data on the layers below?
> Code that looks like what you describe, "implementing something that requires calling a function", tends to deadlock or be wrong
It happens. What you do is you work hard until it's fixed.
I've digged into the filesystem layer of Linux for a while. Going through all the locking rules and making sure your fs is in line with them, that's not a lot of fun. Maybe you should tell the Linux filesystem people how to do it instead?
> yeah but what if "you insert an item" is literally hundreds of lines, and there are 3 layers of api functions below you? What if you need to to take other locks for example to apply backpressure / flush out data on the layers below?
Well- that's what software engineering is about. If insert an item to a shared data structure is hundreds of lines of code I'd say there's something very wrong. You shouldn't need to take another lock to create backpressure, e.g. look at Go's concurrency model.
I think it's a bad pattern as a rule. There are always situations where you break rules. My tip was for most of the situations where you don't do that and most of the people that shouldn't do that. If you know what you're doing, you understand concurrency very well and synchronization very well, then you probably don't need this tip. You can be a very smart and experienced developer and easily create stuff with rare deadlocks that's almost impossible to debug if you're not careful. I've fixed these sorts of issues in multiple code bases.
I've never worked on the Linux filesystem so I'm not going to tell them what to do. We'll have to assume the people working on that know what they're doing, otherwise it'd be a bit scary. Given that we don't see the Linux filesystem deadlocking - probably ok.
EDIT: I've given this rule to many junior/intermediate engineers and I've used it myself so I would say it is applicable to almost any situations where you need to use locking. It results in code is thread safe and simply can't deadlock. This other deadlocking code base I worked on would have been much cleaner if this rule was applied, and it could have been applied, and then it wouldn't deadlock once a year at one random customer site and take their system down. Again, like anything software, sometimes you do things differently in different situations, but maybe the generalization of the rule is you don't just sprinkle locks willy-nilly all over the place, you need to somehow rationalize/codify how the locks and structures work together in a way that guarantees no corner cases will lead to issues. And sure at the "expert" level there are many more patterns for certain situations.
T *item = &this->shared_mem_region
->entities[this->shared_mem_region->consumer_position];
this->shared_mem_region->consumer_position++;
this->shared_mem_region->consumer_position %= this->slots;
you can do this.
uint64_t mask = slot_count - 1; // all 1's in binary
item = &slots[ pos & mask ];
pos ++;
i.e. you can replace a division / modulo with a bitwise AND, saving a bit of computation. This requires that the size of the ringbuffer is a power of two.
What's more, you get to use sequence numbers over the full range of e.g. uint64_t. Wraparound is automatic. You can easily subtract two sequence numbers, this will work without a problem even accounting for wraparound. And you won't have to deal with stupid problems like having to leave one empty slot in the buffer because you would otherwise not be able to discern a full buffer from an empty one.
Naturally, you'll still want to be careful that the window of "live" sequence numbers never exceeds the size of your ringbuffer "window".
This is in direct contradiction to what uecker says. Can you back up your claim -- for both C and C++? Putting your code in godbolt with -O3 did not remove the print statement for me in either C or C++. But I didn't experiment with different compilers or compiler flags, or more complicated program constructions.
I've often said that I've never noticed any surprising consequences from UB personally. I know I'm on thin ice here and running risk of looking very ignorant. There are a lot of blogposts and comments that spread what seems like FUD from my tiny personal lookout. It just seems hard to come across measureable evidence of actual miscompilations happening in the wild that show crazy unpredictable behaviour -- I would really like to have some of it to even be able to start tallying the practical impact.
And disregarding whatever formulations there are in the standard -- I think we can all agree that insofar compilers don't already do this, they should be fixed to reject programs with an error message should they be able to prove UB statically -- instead of silently producing something else or acting like the code wouldn't exist.
Is there an error in my logic -- is there a reason why this shouldn't be practically possible for compilers to do, just based on how UB is defined? With all the flaws that C has, UB seems like a relatively minor one to me in practice.
This is an adaption from the Raymond Chen post, and it seems to actually compile to a "return 1" when compiling with C++ (not with C), at least with the settings I tried. And even the "return 1" for me is understandable given that we actually hit a bug and there are no observeable side-effects before the UB happens. (But again, the compiler should instead be so friendly and emit a diagnostic about what it's doing here, or better return an error).
Un-comment the printf statement and you'll see that the code totally changes. The printf actually happens now. So again, what uecker says about observable effects seems to apply.
In this [1] example GCC hoists, even in C mode, a potentially trapping division above a volatile store. If c=0 you get one less side effect than expected before UB (i.e. the division by zero trap). This is arguably a GCC bug if we agree on the new standard interpretation, but it does show that compilers do some unsafe time travelling transformations.
Hoisting the loop invariant div is an important optimization, but in this case I think the compiler could preserve both the optimization and the ordering of the side effects by loop-peeling.
Thanks for the example. But again I can't see a problem. The compiler does not actually prove UB in this case, so I suppose this doesn't qualify as applying (mis-) optimizations silently based on UB. Or what did I miss?
Let's not get pedantic about what "proving UB" actually means -- that might lead to philosophic discussions about sentient compilers.
Fact is that in this instance, the compiler did not remove a basic black of code (including or excluding "observeable side-effects" leading up to the point of UB happening). It would not be valid for the compiler to assume that the path is never taken in this case, even assuming that UB never happens, because depending on the the value of the variables, there are possible paths through the code that do not exhibit UB. In other words, "the compiler wasn't able to prove UB".
So this is not an instance of the situation that we are discussing. The emitted code is just fine, unless a division by zero occurs. Handling division by zero is responsibility of the programmer.
Nobody is arguing that UB can lead to weird runtime effects -- just dereference an invalid pointer or whatever.
The issue discussed is that based on assumptions about UB, the compiler emits code that does not correspond to the source in an intuitive way, for example a branch of code is entirely removed, including any observeable side-effects that logically happened before the UB.
Now the point of the GGP poster is probably that the observeable side-effect (the volatile access) does not happen at all because the UB happens first. But I would classify this case differently -- the volatile access is not elided from the branch.
Further more, it might well be that (and let me assume so) the order of the volatile access and the division operation that causes the UB are probably not defined as happening in a strict sequence (because, I'm assuming again as any reasonable standards layman would, UB is not considered a side-effect (that would kinda defeat the point, disallowing optimizations)). So it's entirely valid for the compiler to order the operation that causes the (potential) UB before the volatile access.
> The issue discussed is that based on assumptions about UB, the compiler emits code that does not correspond to the source in an intuitive way, for example a branch of code is entirely removed, including any observeable side-effects that logically happened before the UB.
That's literally what happens in my example: the div is hoisted above the volatile read which is an observable side effect. The practical effect is that the expected side effect is not executed even if it should have happened-before the SIGFPE.
uecker claims that the UB should still respect happens-before, and I'm inclined to agree that's an useful property to preserve.
And I don't see any significant difference between my example and what you are arguing.
The compiler is moving a potentially UB operation above a side effect. This contradicts uecker non-time-traveling-ub and it is potentially a GCC bug.
If you want an example of GCC removing a side effect that happens-before provable subsequent UB: https://godbolt.org/z/PfoT8E8PP but I don't find it terribly interesting as the compiler warns here.
extern volatile int x;
int ub() {
int r = x;
r += 1/0;
return r;
}
and the output is
ub:
mov eax, DWORD PTR x[rip]
ud2
I don't see what is the side effect that you say is removed here?
As for the earlier example (hoisting the division out of the loop), I was going to write a wall of text explaining why I find the behaviour totally intuitive and in line with what I'd expect.
But we can make it simpler: The code doesn't even have any observeable side effect (at least I think so), because it only reads the volatile, never writes it! The observeable behaviour is exactly the same as if the hoist hadn't happened. I believe it's a totally valid transformation, at least I don't have any concerns with it.
Here I've inserted an increment of the volatile (i.e. a write access) at the start of the loop. If the divisor is 0, in the optimized version with the division hoisted out of the loop, the increment will never actually happen, not even once. Whereas it should in fact happen 1x at the beginning of the first loop iteration with "unoptimized" code.
I don't find this offputting: First, the incrementing code is still in the output binary. I think what is understood by "time travel", and what would be offputting to most programmers, is if the compiler was making static inferences and was removing entire code branches based on that -- without telling the user. If that was the case, I would consider it a compiler usability bug. But that's not what's happening here.
Second, I think everybody can agree that the compiler should be able to reorder a division operation before a write access, especially when hoisting the division out of a loop. So while maybe an interesting study, I think the behaviour here is entirely reasonable -- irrespective of standards. (But again, I don't think uecker, nor anyone else, said that the compiler may never reorder divisions around side-effecting operations just because the division "could" be UB).
Well, that hoisting is contrary to what uecker says is the standard intent.
I think that discussing about omitting branches is a red herring, there is no expectation that the compiler should emit branches or basic blocks that match the source code even in the boring, non-ub case.
The only constraint to compiler optimizations is the as-if rule and the as-if rule only requires that side effects and their order be preserved for conforming programs. Uecker says that in addition to conforming programs, side effects and their ordering also need to be preserved up to the UB.
I do of course also find it unsurprising that the idiv is hoisted, but, as the only way that the standard can constraint compilers is through observable behaviour, I don't see how you can standardize rules where that form of hoisting is allowed while the other are not.
In fact the compiler could easily optimize that loop while preserving the ordering by transforming it to this:
extern volatile int x;
int ub(int d, int c) {
int r;
x += 3;
r += x;
int _div = d / c;
r += _div;
for (int 2 = 0; i < 100; ++i) {
x += 3;
r += x;
r += _div;
}
return r;
}
This version preserves ordering while still optimizing away the div. In fact this would also work if you replaced the volatile with a function call, which currently GCC doesn't optimize at all.
Thanks for clarifying, I understand much better now.
And I think I can agree that under a strict interpretation of the rule that UB doesn't get reordered with observable behaviour the GCC output in the godbolt is wrong.
Maybe it has something to do with the fact that it's volatiles? I've hardly used volatiles, but as far as I know their semantics have traditionally been somewhat wacky -- poorly understood by programmers and having inconsistent implementations in compilers. I think I've once read that a sequence of volatile accesses can't be reordered, but other memory accesses can very well be reordered around memory accesses. Something like that -- maybe the rules in the compiler are too complicated leading to an optimization like that, which seems erroneous.
But look at this, where I've replaced the volatile access with a printf() call as you describe: https://godbolt.org/z/Ec8aYnc3d . It _does_ get optimized if the division comes before the printf. The compiler seems to be able to do the hoisting (or maybe that can be called "peeling" too?). But not if you swap the two lines such that the printf comes before the division. Maybe the compiler does in fact see that to keep ordering of observable effects, it would have to duplicate both lines, effectively duplicating the entire loop body for a single loop iteration. In any case, it's keeping both the printf() and the div in the loop body.
I do believe that by a strict reading of the standard GCC is non conforming here. This reading of the standard is not agreed by the GCC developers though: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104800
If the first div happens before the first printf, then it can be CSE out of the loop as any trap would have happened before the printf anyway, so no reordering, and if it didn't trap the first time it wouldn't have trapped later either. In this case CSE is fine and there are no reordering.
If the div happens after printf, then reordering is prohibited not only to preserve side effects before UB (which we have seen GCC doesn't necessarily respects), but because for the most part printf is treated as an opaque function: it could legitimately exit, or longjump out of the function or never return, so on the abstract machine the UB might not happen at all. So it is not safe to hoist trapping instruction like div above opaque functions (but it is safe to sink them).
Still the modification I showed for volatile can be applied as well: peel the first iteration out of the loop so that the first printf can be done before computing the div to be CSEd out of the loop. But GCC doesn't do it although it seems desirable.
I'm very sorry, yes, you are right of course the load is still there. I was so fixated in producing a minimal test case that I failed to interpret the result.
Now I'm not able to reproduce the issue with a guaranteed UB. I still think the loop variant shows the same problem though.
In any case, yes, according to the C standard a volatile read counts as an observable side effect.
The implementation can assume that the program does not perpetrate undefined behavior (other than undefined behavior which the implementation itself defines as a documented extension).
The only way the program can avoid perpetrating undefined behavior in the statement "x = x / 0" is if it does not execute that statement.
Thus, to assume that the program does not invoke undefined behavior is tantamount to assuming that the program does not execute "x = x / 0".
But "x = x / 0" follows printf("hello\n") unconditionally. If the printf is executed, then x = x / 0 will be executed. Therefore if the program does not invoke undefined behavior, it does not execute printf("hello\n") either.
If the program can be assumed not to execute printf("hello\n"), there is no need to generate code for it.
Look at the documentation for GCC's __builtin_unreachable:
> If control flow reaches the point of the __builtin_unreachable, the program is undefined. It is useful in situations where the compiler cannot deduce the unreachability of the code.
The unreachable code assertion works by invoking undefined behavior!
x/0 is not reached if the printf blocks forever, exits or return via an exceptional path (longjmp in C, exceptions in C++). Now specifically standard printf won't longjmp or exit (but glibc one can), but it still can block forever, so the compiler in practice can't hoist UB over opaque function calls.
edit: this is in addition to the guarantees with regard to side effects that uecker says the C standard provides.
But does `printf();` return to the caller unconditionally?
This is far from obvious -- especially once SIGPIPE comes into play, it's quite possible that printf will terminate the program and prevent the undefined behavior from occurring. Which means the compiler is not allowed to optimize it out.
`for(;;);` does not terminate; yet it can be removed if it precedes an unreachability assertion.
The only issue is that writing to a stream is visible behavior. I believe that it would still be okay to eliminate visible behavior if the program asserts that it's unreachable. The only reason you might not be able to coax the elimination out of compilers is that they are being careful around visible behavior. (Or, more weakly, around external function calls).
Yeah but do you have an actual instance of "time travel" happening? Without one the issue is merely theoretic discussion of how to understand or implement the standards. If you provide a real instance, the practical impact and possible remedies could be discussed.
#include <stdio.h>
int f(int y, int a) {
int x, z;
printf("hello ");
x = y / a;
printf("world!");
z = y / a;
return x+y;
}
In godbolt, it seems the compiler tends to combine the two printfs together. So if a=0, it leads to UB between the printfs, but that wont happen until after the two printfs. Here the UB is delayed. But will the compiler actually make sure that in some other case, the x/a won't be moved earlier somehow? Does the compiler take any potentially undefined behavior and force ordering constraints around them? ...The whole point of UB is to be able to optimize the code as if it doesn't have undefined behavior, so that we all get the maximum optimization and correct behavior as long as there's no UB in the code.