
Cmd/compile: Go 1.8 regression: sync/atomic loop elided - ainar-g
https://github.com/golang/go/issues/19182
======
vessenes
This is interesting, actually.

As the language is specified, it's not wrong to just ignore that add
completely -- the variable isn't used inside the goroutine, and you're not
guaranteed a goroutine will run at any particular time.

So, code executed right after that routine is spawned will be hitting the
'old' value anyway; hence Dvyukov's comments that this is arguably to spec, or
within spec.

Think if you actually wrote this code; you'd have some weird behavior if the
compiler did what you 'expect' \- a couple outputs of low integers then some
high ones as the goroutine gets scheduled in. So it's almost certainly just
'wrong' to write something like this, under most (all?) circumstances.

Because sync.Wait() isn't used, there's no guarantee the goroutine will
execute at all before program halt.

That said, I bet there's some related more reasonable scenario where you could
argue the compiler should at least do the adding. I'm not sure why atomic.Add*
should get special treatment though, and it looks like the fix is to remove
some escape-analysis type code, so I'm curious how this will get fixed.

~~~
alecrn
Yeah, this particular instance seems ok to me. This one makes the example feel
weirder:

    
    
      func main() {
      	runtime.GOMAXPROCS(runtime.NumCPU())
      	fmt.Println(runtime.NumCPU(), runtime.GOMAXPROCS(0))
      	started := make(chan bool)
      
      	go func() {
      		started <- true
      		for {
      			atomic.AddUint64(&a, uint64(1))
      		}
      	}()
      
      	<-started
      	for {
      		fmt.Println(atomic.LoadUint64(&a))
      		time.Sleep(time.Second)
      	}
      }
    

Here we explicitly wait until the goroutine is started, so we know it's
scheduled by the time our other loop runs. Here on my computer with go1.8
linux/amd64 it still optimizes out the while loop, which makes sense as
nothing changed that would convince the optimizer the loop should remain given
the compiler's current logic.

If you add time.Sleep(time.Millisecond) to the goroutine loop or any other
synchronization it works fine. I'm having trouble thinking of a real world
example where you'd want an atomic operation going ham in a loop without any
sort of time or synchronization. At the very least a channel indicating when
the loop is done would cause the loop to compile.

~~~
flik
>>Strictly saying this is confirming behavior as we don't give any guarantees
about scheduler (just like any other language, e.g. C++). This can be
explained as "the goroutine is just not scheduled".

Not a GO developer. On a multi-processor machine, how is this a conforming
behavior? Is "scheduler cannot give any guarantees" acceptable?

~~~
jerf
'Is "scheduler cannot give any guarantees" acceptable?'

Most schedulers give far fewer _guarantees_ that you might think. A guarantee
to be a guarantee must be true no matter what you do within the boundaries of
the language. If you create a goroutine fork bomb

    
    
        func fork_bomb() {
            for {
                go fork_bomb()
            }
        }
    

Go doesn't, to the best of my knowledge, guarantee that any other goroutine
will get any execution time, or guarantee much of anything will happen. Your
OS is likely to have similarly weak guarantees for the equivalent process
bomb, unless you do something to turn on more guarantees/protection.

You have to go into some relatively special-purpose stuff before you can get
schedulers that will guarantee that some process will get scheduled for at
least 10ms out of every 100ms or something. And then, once you get that
guarantee, you'll pay some other way.

Given the fact that most of our machines are incredibly powerful, and that a
lot of them still get upgraded on a fairly routine schedule in a lot of
dimensions even if single-core clock speed has stalled out, most of us prefer
to work with things that just promise to do their best as long as you don't
overload them, because the other prices you have to pay to get _guarantees_
turn out not to be beneficial on our monster machines. Of course one should
always have their eyes out for when that may not be the case at some point,
but in general we're headed away from rigorous guarantees in favor of shared
resources that are cheaper and more scalable and making up the difference in
volume, rather than trying to get better guarantees.

------
the8472
It is strange, the atomic package[0] exposes various low-level atomic
functions but none of them seem to take ordering arguments and the docs don't
mention a default order either.

Atomicity and ordering constraints are orthogonal, so you can have an atomic
add but relaxed ordering, which allows the compiler to do lots of
optimizations on those adds, presumably to the point where they are optimized
out completely.

If this is the cause it seems like pretty bad API design to not expose the
flags on low-level library functions. Even high level languages like java
expose them[2]

[0] [https://golang.org/pkg/sync/atomic/](https://golang.org/pkg/sync/atomic/)
[1]
[http://en.cppreference.com/w/cpp/atomic/memory_order](http://en.cppreference.com/w/cpp/atomic/memory_order)
[2]
[http://download.java.net/java/jdk9/docs/api/java/lang/invoke...](http://download.java.net/java/jdk9/docs/api/java/lang/invoke/VarHandle.AccessMode.html)

~~~
fmstephe
There is no official ordering provided by the atomic package. An atomic
package isn't very useful without ordering semantics. And ordering actually
_is_ provided, it's just not part of the documentation.

The background is that the Go team were initially reluctant to even make the
atomic package public. It's mostly there to support the runtime and the sync
package which needs fiddly low level concurrency things. But, it exists and it
is public. Some people on the Go team regret this (I believe).

Giving clear and precise semantics to the atomic package was delayed because,
it's a lot of difficult work and you only get to do it once. There was
discussion around trying to find a clear and simple semantics.

In the mean time the atomic package actually had ordering, because you would
have a hard time writing runtime or the sync package without this. Also the
race detector assumes some kind of ordering for operations in the atomic
package. It was confirmed by Dmitry Vyukov that the race detector assumed
C/C++11 acquire/release semantics for atomic operations.

After some discussion it was decided that C/C++11 semantics should be adopted.
But the documentation has not been updated to reflect this. Updating the
documentation doesn't appear to planned right now. Which is a shame.

[https://github.com/golang/go/issues/5045](https://github.com/golang/go/issues/5045)

Above I have sort of put words into the mouths of the Go team (or some members
of it). It's not my intention and bad mouth anyone, and I think everyone was
making decisions for good reasons. Whether I agree or not :)

With respect to adding ordering arguments, Go most likely won't get those for
a very long time.

------
akkartik
Quote of the day:

 _" Language implementation[s] become useless when they take formal properties
to the extreme."_

([https://github.com/golang/go/issues/19182#issuecomment-28107...](https://github.com/golang/go/issues/19182#issuecomment-281076863))

------
YZF
As many of the comments point out it look like this isn't a compiler bug. The
example code makes assumptions about the goroutine execution order without
using the facilities (e.g. channels) that guarantee the right ordering. The
use of atomic doesn't seem to be relevant.

The example code is what I call "works by chance" which I would contrast with
code that "works by design". I think some people are trying to argue that a
language should prevent developers from writing code that "works by chance"
but I don't think that's actually possible. It's the equivalent of saying I
want to be able to type anything I want into my editor and have it compile
into doing what I think it should do... A "works by design" code here would be
adding channel synchronization points between the two goroutines such that the
desired execution order is guaranteed.

It's also IMO not reasonable to expect a new compiler to produce identical
results to an old compiler. E.g. if the compiler does a better job your code
may run faster (or slower) which may break something if you've assumed that
something takes so long to run (e.g. some hard coded timeout in some other
component).

At least the 1.8 compiler behaviour is deterministic. So code written assuming
some particular value of a would either work or fail consistently. Previously
you could construct some code that fails randomly.

------
LeanderK
can somebody comment on what's happening there? I don't know go and i can't
put the pieces together right now. Also aren't optimising-transformations not
checked (verified) for correctness? If not, how can the authors be confident
in the optimisations? Simple tests? Or is it a bug?

~~~
yalue
Variable writes that look like they have no side effects may actually be
intended for another thread to read (but hopefully most reasonable programs
don't do this without some good explanation). However, many concurrency models
say that this method of inter-thread communication is Okay to optimize out,
because most threading libraries (including Go's) make no guarantees about the
order in which threads are scheduled. Therefore, removing the "writing" thread
entirely is equivalent to it never being scheduled.

However, this is not the case for atomic operations (which are generally
specifically intended for communication via shared variables). This
optimization of removing the writing thread entirely isn't allowed if atomic
operations are used, because those actually do come with a guarantee that
other threads will get to see the update in finite time.

Go was incorrectly "optimizing" the atomic operations out entirely. It's a
fairly subtle bug, actually, but still likely a big deal for the few programs
communicating via shared variables.

~~~
masklinn
> However, this is not the case for atomic operations (which are generally
> specifically intended for communication via shared variables). This
> optimization of removing the writing thread entirely isn't allowed if atomic
> operations are used, because those actually do come with a guarantee that
> other threads will get to see the update in finite time.

That sounds more like volatile than atomic. At its core, atomic just means
that observer may see either pre-update or post-update state but won't see
intermediate states.

~~~
yalue
I agree that this sounds more like "volatile", but it still makes sense that
programmers should be able to rely on atomic operations for inter-thread
communication.

According to the comments on the bug report, this behavior is inspired by
C++'s <atomic> library, which guarantees cross-thread visibility.

~~~
chimeracoder
> I agree that this sounds more like "volatile", but it still makes sense that
> programmers should be able to rely on atomic operations for inter-thread
> communication.

Only in the sense that they won't see inconsistent (intermediate) states.

Atomicity shouldn't guarantee order _per se_ \- if you want both, you need a
mutex.

~~~
gpderetta
Ordered atomic operations like C++ std::atomic or java/C# volatile do
guarantee some ordering. For example if one thread modifies an object and then
stores a pointer to an atomic variable, under default visibility rules,
another thread reading that pointer from the atomic variable is guaranteed to
see the modifications to the object itself, i.e. the update, the store and the
remote load are ordered.

It is possible to have a model where all atomic operations are relaxed (i.e.
unordered) and use explicit barriers for ordering (which is closer to how
some, but not all, CPUs work), but it is significantly harder to reason about
and prove algorithms correct under it.

edit: wording

~~~
wbl
No. Read the spec. There is no deadline for atomics to be visible. What is
true is that some operations have happens-before ordering, and if a store
happens-before a load, the load will see the result of the store.

~~~
gpderetta
Did you reply to the wrong post? My post was only about ordering.

Still the C++ spec has wording about making stores visible in a reasonable
time. The wording is non-normative as they run out of time (ah!) while trying
to come up with a more formal (and implementable) specification.

edit: oh, I see I wrote 'default visibility' instead of 'default ordering'.

------
richard_todd
What's funny to me is: if you don't do the add in a loop, go1.8 generates the
add code:

    
    
        go func() {
            atomic.AddUint64(&a, uint64(1))  // no loop
        }()
    

To me, it's the unpredictability of how compilers will choose to apply these
extreme optimizations that is most frustrating. Reasoning from the spec ("the
goroutine might not be scheduled anyway") isn't very satisfying when the
compiler clearly doesn't follow logic like this. The implementers reason from
the spec and then glue together a mishmash of targeted optimizations--
presumably the ones they think will be most helpful.

I suppose many modern modes of programming with composable abstractions result
in lots of dead code, and it's a net win for the compiler to aggressively try
to get rid of it. With simpler languages like C or Go, though, I might prefer
a rule like: "I wouldn't have written the code if I didn't want any
instructions generated from it."

~~~
0xCMP
Luckily go does provide ways to see the compiler's decisions. I'm not sure if
it goes far enough here though

------
londons_explore
What is the overflow behaviour here?

Surely in the case of this example code, the compiler sees an overflow is
going to occur, leading the variable back to zero.

By removing all the code, the compiler has simply combined all the additions
into a single addition which overflows back to the initial value.

Seems working as intended to me.

~~~
mook
As far as I can tell it's just as broken if you change the Add to a Store (at
least, doing that and breaking in the main routine when it's > 1 still times
out on play.golang.org). That just turned into a (terrible) spinlock and has
no overflow issues.

On mobile, so I can't test with a real toolchain.
[https://play.golang.org/p/DlWBh1Via4](https://play.golang.org/p/DlWBh1Via4)

