That still sounds like you might be failing to understand that a compiler based on llvm can (and usually does) produce native machine code that is fully compiled and not running in any kind of JIT or virtual machine environment. And compilers like clang let you turn of optimizations just like gcc does.
Clang and various gcc ports are my daily driver tool-sets. However, they are used in vastly different contexts for well specified reasons (notably, the very same reasons Rust is inappropriate where gcc shines.)
Perhaps there is some subtlety lost in my words, but I assure you it will be unproductive to guess what my thoughts are... given they primarily revolve around cheese Goldfish Crackers.
We could talk about the endangered Breviceps macrops instead... those creatures are hilarious. =3
> Clang and various gcc ports are my daily driver tool-sets. However, they are used in vastly different contexts for well specified reasons
"Well specified" where exactly? You keep alluding to the existence of some significant, fundamental differences with llvm-based tools that disqualify them from their intended use cases, without providing any hint of what you think those differences are.
> Perhaps there is some subtlety lost in my words, but I assure you it will be unproductive to guess what my thoughts are
It's so hard to understand what exactly it is you're trying to communicate that I cannot tell if you are being deliberately bad at communicating, and whether linking to a video of a frog is intended as an insult. Please put some effort into clearly stating fully-formed thoughts.
The llvm optimization essentially makes the source codes implicit structure execution time unpredictable each compilation ( https://en.wikipedia.org/wiki/Code_motion .)
If all you care about is perceived latency under human perception, than the cooked performance of llvm can improve 12% to 23% over traditional gcc for multitasking/Application use-cases in my observation ( https://llvm.org/docs/CompileCudaWithLLVM.html .)
Are you confusing Real-Time and guaranteed-latency-schedulers as being the same thing? This is a commonly conflated concept from marketers, and the naive.
I kind of like the newer RP204x chips designs (still would never use them commercially yet), as it tried to handle GPIO domain RT and Application coding paradigms on the same silicon in a way people could better comprehend ( https://en.wikipedia.org/wiki/Clock_domain_crossing .)
Don't worry about it, at some point it will make sense... or not... cheese Goldfish crackers nom nom nom =3
Ok, your first paragraph seems to indicate you're talking about compile-time behavior, but don't believe that optimizations can be disabled in llvm to the same extent as in gcc, and maybe don't believe that clang can be used for fully reproducible builds.
Your second paragraph does not appear to make any sense on its own or in the context of the rest of your comment. "the cooked performance of llvm" looks like a translation error. Nothing you're talking about has any relevance to a link about CUDA.
Repeating the assertion that using gcc with optimizations disabled is sometimes desirable does not help explain why it is desirable, nor why equivalent results cannot be achieved using llvm. If you actually understand what you're talking about, it sounds like there might be an opportunity for an interesting and enlightening discussion about the specifics of compiler internals. But instead you again veer toward the vague insults and totally off-topic. Please don't do that.
The metastability in clock domains is a complex subject (it is a named problem with a finite number of solutions), and you were warned people have difficulty understanding why it is a challenge.
A toy example:
1. set flag foo=1
2. check flag bar=1
3. set flag foo=0
4. check flag bar=0
These events happen in a sub-optimal repeatable order on non-optimized gcc C, and with precise predictable/verifiable op timing. We will assume interrupts are disabled in critical sections, and register states are repeatable prior to execution.
Yet, the same "bar" may have undefined behavior in llvm given the flag states look unrelated to the compiler, order/operations may not be preserved when abstracted, and linear-timing is now a compile-time packed op specific random ordered implementation. Thus, event times may happen before the parallel state has actually occurred.
Perhaps I am not the right person to explain this subject, but would recommend looking at the zynq FPGA Linux kernel example DMA module structure for more details. It should demonstrate how to coordinate parallel things better using 1 or more of the known metastability issue workarounds, and the constraints of state propagation across complex logic fabric.
In some use-cases repeatability is more important than performance/assumptions-of-safety. There is a clear difference between Application space and RT space.
This is as far as I can guide folks on the subject, and if one has never encountered these issues... than don't worry about it... Best of luck, =3
It sounds to me like you're claiming gcc is superior on the basis of incorrect code that doesn't make proper use of the volatile keyword and explicit memory barriers.
> "volatile" will not help with this class of problem.
Well, it's a good thing my comment had more words past that one. "volatile" is only part of the solution, and when more than one MMIO location is in play (as in your example), "volatile" on its own is insufficient but proper use of "volatile" plus explicit memory barriers will enforce correct ordering of loads and stores.
That still sounds like you might be failing to understand that a compiler based on llvm can (and usually does) produce native machine code that is fully compiled and not running in any kind of JIT or virtual machine environment. And compilers like clang let you turn of optimizations just like gcc does.