
Why volatile is hard to specify and implement - luu
http://kristerw.blogspot.com/2016/04/why-volatile-is-hard-to-specify-and.html
======
caf
The example saying:

 _This means that the compiler is allowed to optimize_

    
    
      volatile int *p = somevalue;
      int s = *p + *p;
    

_as if it said_

    
    
      volatile int *p = somevalue;
      int tmp = *p;
      int s = tmp + tmp;
    

is incorrect. The clause about sequence points starts with "Futhermore, " so
it is an _additional_ restriction, _and_ to boot it only talks about stores to
the volatile-qualified object, not loads, so it is entirely inapplicable to
this example.

The prior clause saying _" Therefore any expression referring to such an
object shall be evaluated strictly according to the rules of the abstract
machine, as described in 5.1.2.3."_ applies, and the rules of the abstract
machine for the binary + operator are that each operand is evaluated (in an
unspecified order). So both dereferences must be evaluated.

~~~
vrfcodf
I agree, I think this part makes it clear: _except as modified by the unknown
factors mentioned previously._

Although this is quite contradicting:

 _5.1.2.3. p4: In the abstract machine, all expressions are evaluated as
specified by the semantics. An actual implementation need not evaluate part of
an expression if it can deduce that its value is not used and that no needed
side effects are produced (including any caused by calling a function or
accessing a volatile object)_

------
rdtsc
Volatile is not just for memory mapped hardware, but is also needed when using
shared memory, say between two processes.

I have been bit by not having volatile modifiers for data residing in shared
memory. Funny thing (and retrospectively obvious) is, it worked in debug mode,
but in production data was seen by the other process. It was a fun exercise to
find it one.

~~~
burfog
It's not for memory-mapped hardware. This is unsafe. You need something to
flush the pipeline, prevent store merging, prevent caching, and more.

It's nearly banned from the Linux kernel. It gets used in one place, for the
clock tick counter.

There is one other good use: suppression of compiler bugs.

~~~
Avernar
Volatile is for memory mapped IO. Some processors may need extra instuctions
or configuration but that doesn't replace the need for volatile with MMIO.

On x86 the MTTR/PAT mechanisms is what disables caching, write combining and
enables write through to MMIO address ranges.

~~~
burfog
Yes it does replace the need for volatile. The actual access is done via
assembly language, either inline (better) or not.

You could make volatile work, probably, by putting inline assembly both before
and after it. In that case you could probably even skip marking things
valatile, because the inline assembly will make the compiler behave. Why
bother? This is more complicated than just doing the access from assembly.

MTRR/PAT is often not enough, or it would only be enough if you killed
performance elsewhere. (different types of MMIO in the same page) It's also
not something you can rely on if you want to write portable code.

~~~
Avernar
Yes, you can use memory barriers for MMIO but you'd have to put them between
every single statement that touches the MMIO address range. Not only will that
bloat your code it would slow down access to all your regular variables as the
compiler would have to reload everything constantly and the processor wouldn't
be able to out of order anything.

Why do you think volatile won't work for MMIO with the proper MTRR/PAT setup?
The volatile keeps the compiler for messing with the read/writes and the
MTRR/PAT makes the processor perform all reads and writes exactly in the order
the instructions tell it to without any caching or write combining.

"This is more complicated than just doing the access from assembly." and "It's
also not something you can rely on if you want to write portable code." So in
one paragraph you're telling me to do it in assembly and in another you want
portable code.

The linux kernel has gone above and beyond using volatile but in embedded it's
how IO is done when using C.

~~~
burfog
I make the assumption that you put the inline assembly in a macro. This keeps
your C portable. The macro gets pulled in from a header file that is specific
to the CPU.

MTRR/PAT won't save you, at least without a huge performance loss, when you
want write combining (etc.) for __most __of the addresses in a page of address
space but not all of them.

MTRR/PAT won't save you from reordering that happens in a PCI bridge. It's
allowed and it is done.

~~~
Avernar
We're into semantics here. Portable requiring a CPU shim vs portable not
requiring one. More complicated cpu/bus interfaces require the first one but
if you can just use volatile for IO as done in embedded work you get much
smaller and/or faster code.

If a device has registers that are sensitive to multiple read/writes of the
same address and ones that need caching and write combining for performance in
the same 4K page then it was designed poorly. :)

PCI bridges need to support legacy mode. Otherwise DOS programs and drivers
wouldn't run correctly. The BIOS sets up a compatibility MTRR setup for this
reason and you can see the Linux kernel print it out on boot.

So I'm basically agreeing and disagreeing with you on this one. Once the
hardware gets past a certain complexity you need to move from volatile to
accessor functions/macros.

