
Bridging the Gap Between Programming Languages and Hardware Weak Memory Models - godelmachine
https://arxiv.org/abs/1807.07892
======
nielsbot
Could someone provide a non-academic summary of what this means? I am a
programmer but not a computer scientist...

~~~
codebje
The simplest way to think about memory is as a place where one executed
instruction might write a value into a slot, and the next be able to read it.

In practice, things are more complicated. The next instruction might be from a
different process, executing on a different (hyper)core. Or the next
instruction executed might not be the next instruction in the program, because
the CPU may be performing out-of-order execution. The CPU will have several
layers of cache as well, and cache coherency protocols aren't always engaged.

In a language like C, you can declare a variable to be volatile. When the
program is compiled, this causes the compiler to emit memory barrier
instructions - these barriers force the CPU to do the work to ensure some
semblance of sequential memory access is restored across the two sides of the
barrier. These may be implicit, in the memory semantics of the language
itself, or explicit operations in the language.

However, it's not easy to be sure that the barriers that end up in compiled
and executed code actually provide the memory semantics that the language
specified. This is exacerbated by the different semantics of different
hardware architectures, and the overall complexity of the problem space.

The paper provides a way to address this problem by introducing an
intermediate memory model, which is amenable to formal verification of
compiler outcomes. They map this model to a handful of popular architectures,
showing that the mappings are correct (more or less, that for all possible
execution flows, the observable behaviour is the same). They also map from
RC++ (repaired C++11, fixing an unsound semantics in concurrent memory access)
and Promise ("the promising model") to the IMM.

Assuming the proofs are correct (a non-trivial assumption in this field,
unfortunately) this offers a step forward in the process of formally verifying
that the output of compilers actually behaves in the way that the program
should, according to the programming language's semantics. There aren't many
formally verified compilers around, and those that do exist are for subsets of
languages we might use, limiting their applicability.

~~~
cesarb
> In a language like C, you can declare a variable to be volatile. When the
> program is compiled, this causes the compiler to emit memory barrier
> instructions - these barriers force the CPU to do the work to ensure some
> semblance of sequential memory access is restored across the two sides of
> the barrier.

That is the case in Java, but not in C (which is a source of confusion to
those who learned Java first). In C, volatile only guarantees that the
compiler will not use a cached value when reading from the variable, and will
not omit an apparently redundant write when writing to the variable. It's
mostly useless except when dealing with memory-mapped hardware devices (where
reads and writes have side effects), with Unix signals on single-threaded
programs (for instance, having SIGTERM set a variable read by the main loop -
without volatile, the compiler might read only once), or with setjmp/longjmp.

~~~
toadworrier
Yes, it's useful for the problems that were around when it was invented and
useless for the new problems that have arisen since.

I want my old problems back.

